您现在的位置是:网站首页> 编程资料编程资料

PostgreSQL 对IN,EXISTS,ANY/ALL,JOIN的sql优化方案_PostgreSQL_

2023-05-27 546人已围观

简介 PostgreSQL 对IN,EXISTS,ANY/ALL,JOIN的sql优化方案_PostgreSQL_

测试环境:

 postgres=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit (1 row) postgres=#

数据准备:

 $ pgbench -i -s 10
 postgres=# \d List of relations Schema | Name | Type | Owner --------+------------------+-------+---------- public | pgbench_accounts | table | postgres public | pgbench_branches | table | postgres public | pgbench_history | table | postgres public | pgbench_tellers | table | postgres (4 rows) postgres=# select * from pgbench_accounts limit 1; aid | bid | abalance | filler -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 0 | (1 row) postgres=# select * from pgbench_branches limit 1; bid | bbalance | filler -----+----------+-------- 1 | 0 | (1 row) postgres=# select * from pgbench_history limit 1; tid | bid | aid | delta | mtime | filler -----+-----+-----+-------+-------+-------- (0 rows) postgres=# select * from pgbench_tellers limit 1; tid | bid | tbalance | filler -----+-----+----------+-------- 1 | 1 | 0 | (1 row) postgres=# select * from pgbench_branches; bid | bbalance | filler -----+----------+-------- 1 | 0 | 2 | 0 | 3 | 0 | 4 | 0 | 5 | 0 | 6 | 0 | 7 | 0 | 8 | 0 | 9 | 0 | 10 | 0 | (10 rows) postgres=# update pgbench_branches set bbalance=4500000 where bid in (4,7); UPDATE 2 postgres=#

IN语句

查询要求:找出那些余额(balance)大于0的每个分支(branch)在表在pgbench_accounts中有多少个账户

1.使用IN子句 

 SELECT count( aid ),bid FROM pgbench_accounts WHERE bid IN ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid; 

2.使用ANY子句

 SELECT count( aid ),bid FROM pgbench_accounts WHERE bid = ANY ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;

  

3.使用EXISTS子句

 SELECT count( aid ),bid FROM pgbench_accounts WHERE EXISTS ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 AND pgbench_accounts.bid = pgbench_branches.bid ) GROUP BY bid;

  

4.使用INNER JOIN

 SELECT count( aid ),a.bid FROM pgbench_accounts a JOIN pgbench_branches b ON a.bid = b.bid WHERE b.bbalance > 0 GROUP BY a.bid;

在完成这个查询要求的时候,有人可能会假设exists和inner join性能可能会更好,因为他们可以使用两表连接的逻辑和优化。而IN和ANY子句需要使用子查询。

然而,PostgreSQL(10版本之后)已经智能的足以对上面四种写法产生相同的执行计划!

所有上面的写法都会产生相同的执行计划:

 QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize GroupAggregate (cost=23327.73..23330.26 rows=10 width=12) (actual time=97.199..99.014 rows=2 loops=1) Group Key: a.bid -> Gather Merge (cost=23327.73..23330.06 rows=20 width=12) (actual time=97.191..99.006 rows=6 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=22327.70..22327.73 rows=10 width=12) (actual time=93.762..93.766 rows=2 loops=3) Sort Key: a.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=22327.44..22327.54 rows=10 width=12) (actual time=93.723..93.727 rows=2 loops=3) Group Key: a.bid -> Hash Join (cost=1.14..22119.10 rows=41667 width=8) (actual time=24.024..83.263 rows=66667 loops=3) Hash Cond: (a.bid = b.bid) -> Parallel Seq Scan on pgbench_accounts a (cost=0.00..20560.67 rows=416667 width=8) (actual time=0.023..43.151 rows=333333 loops=3) -> Hash (cost=1.12..1.12 rows=1 width=4) (actual time=0.027..0.028 rows=2 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pgbench_branches b (cost=0.00..1.12 rows=1 width=4) (actual time=0.018..0.020 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.342 ms Execution Time: 99.164 ms (22 rows)

那么,我们是否可以得出这样的结论:我们可以随意地编写查询,而PostgreSQL的智能将会处理其余的问题?!

等等!

如果我们考虑排除情况,事情会变得不同。

排除查询

查询要求:找出那些余额(balance)不大于0的每个分支(branch)在表在pgbench_accounts中有多少个账户

1.使用NOT IN

 SELECT count( aid ),bid FROM pgbench_accounts WHERE bid NOT IN ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;

执行计划:

 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=23645.42..23647.95 rows=10 width=12) (actual time=128.606..130.502 rows=8 loops=1) Group Key: pgbench_accounts.bid -> Gather Merge (cost=23645.42..23647.75 rows=20 width=12) (actual time=128.598..130.490 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=22645.39..22645.42 rows=10 width=12) (actual time=124.960..124.963 rows=8 loops=3) Sort Key: pgbench_accounts.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=22645.13..22645.23 rows=10 width=12) (actual time=124.917..124.920 rows=8 loops=3) Group Key: pgbench_accounts.bid -> Parallel Seq Scan on pgbench_accounts (cost=1.13..21603.46 rows=208333 width=8) (actual time=0.078..83.134 rows=266667 loops=3) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 66667 SubPlan 1 -> Seq Scan on pgbench_branches (cost=0.00..1.12 rows=1 width=4) (actual time=0.020..0.021 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.310 ms Execution Time: 130.620 ms (21 rows) postgres=# 

2.使用<>ALL

 SELECT count( aid ),bid FROM pgbench_accounts WHERE bid <> ALL ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;

执行计划:

 QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize GroupAggregate (cost=259581.79..259584.32 rows=10 width=12) (actual time=418.220..419.913 rows=8 loops=1) Group Key: pgbench_accounts.bid -> Gather Merge (cost=259581.79..259584.12 rows=20 width=12) (actual time=418.212..419.902 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=258581.76..258581.79 rows=10 width=12) (actual time=413.906..413.909 rows=8 loops=3) Sort Key: pgbench_accounts.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=258581.50..258581.60 rows=10 width=12) (actual time=413.872..413.875 rows=8 loops=3) Group Key: pgbench_accounts.bid -> Parallel Seq Scan on pgbench_accounts (cost=0.00..257539.83 rows=208333 width=8) (actual time=0.054..367.244 rows=266667 loops=3) Filter: (SubPlan 1) Rows Removed by Filter: 66667 SubPlan 1 -> Materialize (cost=0.00..1.13 rows=1 width=4) (actual time=0.000..0.001 rows=2 loops=1000000) -> Seq Scan on pgbench_branches (cost=0.00..1.12 rows=1 width=4) (actual time=0.001..0.001 rows=2 loops=337880) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.218 ms Execution Time: 420.035 ms (22 rows) postgres=# 

3.使用NOT EXISTS

 SELECT count( aid ),bid FROM pgbench_accounts WHERE NOT EXISTS ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 AND pgbench_accounts.bid = pgbench_branches.bid ) GROUP BY bid;

执行计划:

 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=28327.72..28330.25 rows=10 width=12) (actual time=152.024..153.931 rows=8 loops=1) Group Key: pgbench_accounts.bid -> Gather Merge (cost=28327.72..28330.05 rows=20 width=12) (actual time=152.014..153.917 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=27327.70..27327.72 rows=10 width=12) (actual time=147.782..147.786 rows=8 loops=3) Sort Key: pgbench_accounts.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=27327.43..27327.53 rows=10 width=12) (actual time=147.732..147.737 rows=8 loops=3) Group Key: pgbench_accounts.bid -> Hash Anti Join (cost=1.14..25452.43 rows=375000 width=8) (actual time=0.134..101.884 rows=266667 loops=3) Hash Cond: (pgbench_accounts.bid = pgbench_branches.bid) -> Parallel Seq Scan on pgbench_accounts (cost=0.00..20560.67 rows=416667 width=8) (actual time=0.032..45.174 rows=333333 loops=3) -> Hash (cost=1.12..1.12 rows=1 width=4) (actual time=0.036..0.037 rows=2 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pgbench_branches (cost=0.00..1.12 rows=1 width=4) (actual time=0.025..0.027 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.322 ms Execution Time: 154.040 ms (22 rows) postgres=# 

4.使用LEFT JOIN和IS NULL

 SELECT count( aid ),a.bid FROM pgbench_accounts a LEFT JOIN pgbench_branches b ON a.bid = b.bid AND b.bbalance > 0 WHERE b.bid IS NULL GROUP BY a.bid;

执行计划:

 QUERY PLAN ----------------------------------------------------------------------------------------------------------
                
                

-六神源码网