你知道PostgreSQL支持表的部分索引吗?这使得读取部分索引是非常快速的,几乎没有什么索引开销。如果你想重复地分析那些匹配给定的 WHERE 子句的行数据,那部分索引是最好的索引数据的方式。这使得PostgreSQL非常适合于那种涉及预聚集额外的特别分析的工作流。在这一点上,我将通过一个非常适合使用部分索引的查询优化示例来实践下。
test=# select path, cost, sum(cost) over (orderby cost desc) as sum_cost, sum(cost) over (orderby cost descrowsbetweencurrentrowand2 following ) asrowfrom tcost;
注意,上面那条SQL没有写 partition by ,那默认情况下就是以整个表来表示窗口化的,即只有一个窗口。现在试着,添加上partition by 语句的结果看看:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
test=# select path, cost, sum(cost) over (orderby cost desc) as sum_cost, sum(cost) over (partitionby path orderby cost descrowsbetweencurrentrowand2 following ) asrowfrom tcost;
path | cost | sum_cost | row
------+-------+----------+-------
32 | 3.4 | 433.7 | 3.84
32 | 0.4 | 434.1 | 0.44
32 | 0.04 | 434.14 | 0.04
111 | 33.4 | 400.2 | 60.1
111 | 23.3 | 423.5 | 26.7
111 | 3.4 | 433.7 | 3.4
222 | 333.4 | 333.4 | 370.2
222 | 33.4 | 400.2 | 36.8
222 | 3.4 | 433.7 | 3.4
(9 rows)
test=#
当 partition by path时,可以看到,这些ROWS 模式,都是在当前所在的窗口来进行的,并不会跨窗口来进行。 所以,在这里强调一下,ROWS表示的是物理行。
RANGE
看看,当是RANGE时的结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
test=# select path, cost, sum(cost) over (orderby cost desc) as sum_cost, sum(cost) over (orderby cost desc range betweencurrentrowand UNBOUNDED following ) as range from tcost;
test=# select path, cost, sum(cost) over (orderby cost desc) as sum_cost, sum(cost) over (orderby cost desc range betweencurrentrowand UNBOUNDED following ) as range,sum(cost) over (orderby cost descrowsbetweencurrentrowand UNBOUNDED following ) asrowfrom tcost;
[ RANGE | ROWS ] BETWEEN frame_start AND frame_end
frame_start 和 frame_end可以是:
1
2
3
4
5
UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING
特别注意:value PRECEDING和value FOLLOWING 当前只允许ROWS模式。 RANGE模式后面只能接 UNBOUNDED FOLLOWING。
默认的框架选项是RANGE UNBOUNDED PRECEDING,该选项与 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW相同。有ORDER BY, 它设置框架从分区的开始一直到与当前行相同的最后一行。没有ORDER BY, 那么就是当前分区的所有行都包含在框架中,因为所有行都会成为当前行的相同行。
RANGE时,请注意有没有 order by 的区别
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
test=# select path, cost, sum(cost) over (orderby cost desc) as sum_cost, sum(cost) over (range betweencurrentrowand UNBOUNDED following ) as no_order_by_range,sum(cost) over (orderby cost desc range betweencurrentrowand UNBOUNDED following ) as has_order_by_range from tcost;
test=# select path, cost, sum(cost) over (orderby cost desc) as sum_cost, sum(cost) over (rowsbetweencurrentrowand UNBOUNDED following ) as no_order_by_rows,sum(cost) over (orderby cost descrowsbetweencurrentrowand UNBOUNDED following ) as has_order_by_rows from tcost;
-> Index Only Scan using tgroup_point on tgroup (cost=0.43..522965.94rows=11534336width=4)
-> CTE Scan on cte (cost=0.00..230686.72rows=11534336width=8)
(5 rows)
Time: 0.909 ms
总结
看来没有 WHERE 或其他条件过滤数据而且数据量非常大的情况下,不适宜使用CTE表达式,因为它本质是一个一次性视图,生成一张这么大的视图,性能也快不到哪里去(可能使用物化视图会好点,不过没有测试过)。在大量数据情况下,还是使用普通的全表扫描比使用生成CTE再全表扫描来得快。这也应验了之前翻译篇文章的强调:CTE表达式的作用,真的不是为了加快查询速度,而仅仅是为了方便。冏 ~~。