DBA 脑子要咋长才行啊....

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 2018 天前的主题，其中的信息可能已经有所发展或是发生改变。

PG 里两条一样的语句,时间跨度不同,结果优化成完全不同的执行方式....

语句语句用于计算：当天注册的人的登陆次数,一张注册表日志,一张登陆日志表 group by 用于合并渠道和游戏 id. 为了避免无法去重没用 distinct

select one_day, count() login_count, ROUND((count() / sum(count(*)) over(PARTITION BY one_day, app_id order by app_id))*100, 2) app_percent, sub_channel, app_id from (select reg.reg_day as one_day, login.app_id as app_id, login.sub_channel as sub_channel, reg.user_id from (select count_time, date_trunc('day', count_time) as reg_day, user_id from t_l_register_log where count_time >= '2019-03-03 00:00:00' and count_time < '2019-04-04 00:00:00' order by reg_day) as reg inner join (select date_trunc('day', count_time) as login_day ,sub_channel, app_id, user_id from t_l_platform_login_log where count_time >= '2019-03-03 00:00:00' and count_time < '2019-04-04 00:00:00' group by login_day, app_id, sub_channel, user_id order by login_day) as login on date_trunc('day', reg.count_time) = login_day and reg.user_id = login.user_id group by one_day, app_id, sub_channel, reg.user_id) as tb group by one_day, app_id, sub_channel order by one_day, app_id, login_count desc, app_percent desc;

这个是能这却执行的 sql,时间跨度 1 年

------------------------------------------------------------------------------------------
 Sort  (cost=97561067.23..97581067.23 rows=8000000 width=56) (actual time=34585.894..34586.519 rows=18517 loops=1)
   Output: reg.reg_day, (count(*)), (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)), login.sub_channel, login.app_id
   Sort Key: reg.reg_day, login.app_id, (count(*)) DESC, (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)) DESC
   Sort Method: quicksort  Memory: 2215kB
   ->  WindowAgg  (cost=83444220.45..96096929.48 rows=8000000 width=56) (actual time=33708.198..34578.031 rows=18517 loops=1)
         Output: reg.reg_day, (count(*)), round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2), login.sub_channel, login.app_id
         ->  GroupAggregate  (cost=83444220.45..95856929.48 rows=8000000 width=24) (actual time=33708.188..34562.551 rows=18517 loops=1)
               Output: reg.reg_day, login.sub_channel, login.app_id, count(*)
               Group Key: reg.reg_day, login.app_id, login.sub_channel
               ->  Group  (cost=83444220.45..87555123.46 rows=328872241 width=28) (actual time=33708.179..34341.267 rows=3198877 loops=1)
                     Output: reg.reg_day, login.app_id, login.sub_channel, reg.user_id, login.app_id, login.sub_channel
                     Group Key: reg.reg_day, login.app_id, login.sub_channel, reg.user_id
                     ->  Sort  (cost=83444220.45..84266401.05 rows=328872241 width=20) (actual time=33708.175..33958.476 rows=3198877 loops=1)
                           Output: reg.reg_day, reg.user_id, login.app_id, login.sub_channel
                           Sort Key: reg.reg_day, login.app_id, login.sub_channel, reg.user_id
                           Sort Method: external merge  Disk: 93944kB
                           ->  Merge Join  (cost=10070881.07..16687034.80 rows=328872241 width=20) (actual time=28498.899..31904.531 rows=3198877 loops=1)
                                 Output: reg.reg_day, reg.user_id, login.app_id, login.sub_channel
                                 Merge Cond: (((date_trunc('day'::text, reg.count_time)) = login.login_day) AND (reg.user_id = login.user_id))
                                 ->  Sort  (cost=876809.36..883338.44 rows=2611632 width=20) (actual time=2665.030..2929.604 rows=3204477 loops=1)
                                       Output: reg.reg_day, reg.user_id, reg.count_time, (date_trunc('day'::text, reg.count_time))
                                       Sort Key: (date_trunc('day'::text, reg.count_time)), reg.user_id
                                       Sort Method: external merge  Disk: 131744kB
                                       ->  Subquery Scan on reg  (cost=142257.54..491333.83 rows=2611632 width=20) (actual time=376.086..1283.740 rows=3204477 loops=1)
                                             Output: reg.reg_day, reg.user_id, reg.count_time, date_trunc('day'::text, reg.count_time)
                                             ->  Gather Merge  (cost=142257.54..465217.51 rows=2611632 width=20) (actual time=376.081..815.735 rows=3204477 loops=1)
                                                   Output: _hyper_8_974_chunk.count_time, (date_trunc('day'::text, _hyper_8_974_chunk.count_time)), _hyper_8_974_chunk.user_id
                                                   Workers Planned: 8
                                                   Workers Launched: 8
                                                   ->  Sort  (cost=141257.40..142073.54 rows=326454 width=20) (actual time=346.654..379.105 rows=356053 loops=9)
                                                         Output: _hyper_8_974_chunk.count_time, (date_trunc('day'::text, _hyper_8_974_chunk.count_time)), _hyper_8_974_chunk.user_id
                                                         Sort Key: (date_trunc('day'::text, _hyper_8_974_chunk.count_time))
                                                         Sort Method: external merge  Disk: 10176kB

这个是没有正确执行的 sql,时间跨度一个月,不知道要跑多少时间

---------------------------------------------------------------------------
 Sort  (cost=1378360.26..1378362.88 rows=1046 width=56)
   Output: reg.reg_day, (count(*)), (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id
   Sort Key: reg.reg_day, _hyper_4_1229_chunk.app_id, (count(*)) DESC, (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)) DESC
   ->  WindowAgg  (cost=1377873.71..1378307.80 rows=1046 width=56)
         Output: reg.reg_day, (count(*)), round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id
         ->  GroupAggregate  (cost=1377873.71..1378276.42 rows=1046 width=24)
               Output: reg.reg_day, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, count(*)
               Group Key: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
               ->  Group  (cost=1377873.71..1378004.46 rows=10460 width=28)
                     Output: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, reg.user_id, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                     Group Key: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, reg.user_id
                     ->  Sort  (cost=1377873.71..1377899.86 rows=10460 width=20)
                           Output: reg.reg_day, reg.user_id, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                           Sort Key: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, reg.user_id
                           ->  Merge Join  (cost=652242.46..1377175.37 rows=10460 width=20)
                                 Output: reg.reg_day, reg.user_id, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                                 Merge Cond: ((date_trunc('day'::text, _hyper_4_1229_chunk.count_time)) = (date_trunc('day'::text, reg.count_time)))
                                 Join Filter: (reg.user_id = _hyper_4_1229_chunk.user_id)
                                 ->  Group  (cost=650558.24..1318952.53 rows=979872 width=20)
                                       Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                       Group Key: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.user_id
                                       ->  Gather Merge  (cost=650558.24..1267509.25 rows=4899360 width=20)
                                             Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                             Workers Planned: 5
                                             ->  Group  (cost=649558.16..676504.67 rows=979872 width=20)
                                                   Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                   Group Key: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.user_id
                                                   ->  Sort  (cost=649558.16..654457.53 rows=1959746 width=20)
                                                         Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                         Sort Key: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.user_id
                                                         ->  Result  (cost=0.00..404552.30 rows=1959746 width=20)
                                                               Output: date_trunc('day'::text, _hyper_4_1229_chunk.count_time), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                               ->  Parallel Append  (cost=0.00..380055.48 rows=1959746 width=20)

正确语句用了日期和 userid 排序, 然后正确的双排序 join 了不正确的语句没用日期排序,最后导致了 M*N?

有人能帮我解释下是不是这个理啊 orz

DBA 脑子要要怎么长才能了了解数据库的各种坑啊......

这 DBA 要怎么干啊 orz,我光测一条语句就要疯了...

app_id

Rows

unt

1 条回复 • 2019-07-11 10:36:37 +08:00

CallMeReznov

2019-07-11 10:36:37 +08:00

DBA 是另外一个物种