1月 1 2015

PostgreSQL窗口函数中 ROWS 和 RANGE 模式的区别

示例表数据如下：

test=# select * from tcost ;
 path | cost 
------+-------
  111 |  23.3
  111 |  33.4
  111 |   3.4
  222 |   3.4
  222 |  33.4
  222 | 333.4
   32 |   3.4
   32 |   0.4
   32 |  0.04
(9 rows)
 
test=#

ROWS

test=# select path, cost, sum(cost) over (order by cost desc) as sum_cost, sum(cost) over (order by cost desc rows between current row and 2 following ) as row from tcost;
 path | cost  | sum_cost |  row 
------+-------+----------+-------
  222 | 333.4 |    333.4 | 400.2
  111 |  33.4 |    400.2 |  90.1
  222 |  33.4 |    400.2 |  60.1
  111 |  23.3 |    423.5 |  30.1
  111 |   3.4 |    433.7 |  10.2
   32 |   3.4 |    433.7 |   7.2
  222 |   3.4 |    433.7 |  3.84
   32 |   0.4 |    434.1 |  0.44
   32 |  0.04 |   434.14 |  0.04
(9 rows)
 
test=#

可以看到 max2 的值都是由 current row （当前行） and 2 following(紧接着2行）的sum()结果出来的。
即
400.2 = 333.4 + 33.4 + 33.4 得出的。
90.1 = 33.4 + 33.4 + 23.3 得出的。

注意，上面那条SQL没有写 partition by ，那默认情况下就是以整个表来表示窗口化的，即只有一个窗口。现在试着，添加上partition by 语句的结果看看：

test=# select path, cost, sum(cost) over (order by cost desc) as sum_cost, sum(cost) over (partition by path order by cost desc rows between current row and 2 following ) as row from tcost;
 path | cost  | sum_cost |  row 
------+-------+----------+-------
   32 |   3.4 |    433.7 |  3.84
   32 |   0.4 |    434.1 |  0.44
   32 |  0.04 |   434.14 |  0.04
  111 |  33.4 |    400.2 |  60.1
  111 |  23.3 |    423.5 |  26.7
  111 |   3.4 |    433.7 |   3.4
  222 | 333.4 |    333.4 | 370.2
  222 |  33.4 |    400.2 |  36.8
  222 |   3.4 |    433.7 |   3.4
(9 rows)
 
test=#

当 partition by path时，可以看到，这些ROWS 模式，都是在当前所在的窗口来进行的，并不会跨窗口来进行。
所以，在这里强调一下，ROWS表示的是物理行。

RANGE

看看，当是RANGE时的结果

test=# select path, cost, sum(cost) over (order by cost desc) as sum_cost, sum(cost) over (order by cost desc range  between current row and  UNBOUNDED  following ) as range from tcost;
 path | cost  | sum_cost | range 
------+-------+----------+--------
  222 | 333.4 |    333.4 | 434.14
  111 |  33.4 |    400.2 | 100.74
  222 |  33.4 |    400.2 | 100.74
  111 |  23.3 |    423.5 |  33.94
  111 |   3.4 |    433.7 |  10.64
   32 |   3.4 |    433.7 |  10.64
  222 |   3.4 |    433.7 |  10.64
   32 |   0.4 |    434.1 |   0.44
   32 |  0.04 |   434.14 |   0.04
(9 rows)
 
test=#

可以看到，RANGE时，相同数据的会被合并到一起再来进行计算，也表明，列中具有相同值的range的值也是相同的，并且结果是它们合并后进行计算后的结果。

test=# select path, cost, sum(cost) over (order by cost desc) as sum_cost, sum(cost) over (order by cost desc range  between current row and  UNBOUNDED  following ) as range,sum(cost) over (order by cost desc rows between current row and UNBOUNDED  following ) as row from tcost;
 path | cost  | sum_cost | range  |  row  
------+-------+----------+--------+--------
  222 | 333.4 |    333.4 | 434.14 | 434.14
  111 |  33.4 |    400.2 | 100.74 | 100.74
  222 |  33.4 |    400.2 | 100.74 |  67.34
  111 |  23.3 |    423.5 |  33.94 |  33.94
  111 |   3.4 |    433.7 |  10.64 |  10.64
   32 |   3.4 |    433.7 |  10.64 |   7.24
  222 |   3.4 |    433.7 |  10.64 |   3.84
   32 |   0.4 |    434.1 |   0.44 |   0.44
   32 |  0.04 |   434.14 |   0.04 |   0.04
(9 rows)
 
test=#

这里可以非常明显看到RANGE和ROWS的区别。
ROWS：是按物理行来进行区分的
RANGE：是按数值进行逻辑区分的

RANGE 和 ROWS 在PostgreSQL中的语法

1 2	[ RANGE \| ROWS ] frame_start [ RANGE \| ROWS ] BETWEEN frame_start AND frame_end

frame_start 和 frame_end可以是：

UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING

特别注意：value PRECEDING和value FOLLOWING 当前只允许ROWS模式。
RANGE模式后面只能接 UNBOUNDED FOLLOWING。

默认的框架选项是RANGE UNBOUNDED PRECEDING，该选项与 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW相同。有ORDER BY，它设置框架从分区的开始一直到与当前行相同的最后一行。没有ORDER BY，那么就是当前分区的所有行都包含在框架中，因为所有行都会成为当前行的相同行。

RANGE时，请注意有没有 order by 的区别

test=# select path, cost, sum(cost) over (order by cost desc) as sum_cost, sum(cost) over (range  between current row and  UNBOUNDED  following ) as no_order_by_range,sum(cost) over (order by cost desc range between current row and UNBOUNDED  following ) as has_order_by_range from tcost;
 path | cost  | sum_cost | no_order_by_range | has_order_by_range
------+-------+----------+-------------------+--------------------
  222 | 333.4 |    333.4 |            434.14 |             434.14
  111 |  33.4 |    400.2 |            434.14 |             100.74
  222 |  33.4 |    400.2 |            434.14 |             100.74
  111 |  23.3 |    423.5 |            434.14 |              33.94
  111 |   3.4 |    433.7 |            434.14 |              10.64
   32 |   3.4 |    433.7 |            434.14 |              10.64
  222 |   3.4 |    433.7 |            434.14 |              10.64
   32 |   0.4 |    434.1 |            434.14 |               0.44
   32 |  0.04 |   434.14 |            434.14 |               0.04
(9 rows)
 
test=#

没有ORDER BY，那么就是当前分区的所有行都包含在框架中，因为所有行都会成为当前行的相同行。

ROWS时，请注意有没有 order by 的区别

test=# select path, cost, sum(cost) over (order by cost desc) as sum_cost, sum(cost) over (rows  between current row and  UNBOUNDED  following ) as no_order_by_rows,sum(cost) over (order by cost desc rows between current row and UNBOUNDED  following ) as has_order_by_rows from tcost;
 path | cost  | sum_cost | no_order_by_rows | has_order_by_rows
------+-------+----------+------------------+-------------------
  222 | 333.4 |    333.4 |           434.14 |            434.14
  111 |  33.4 |    400.2 |           100.74 |            100.74
  222 |  33.4 |    400.2 |            67.34 |             67.34
  111 |  23.3 |    423.5 |            33.94 |             33.94
  111 |   3.4 |    433.7 |            10.64 |             10.64
   32 |   3.4 |    433.7 |             7.24 |              7.24
  222 |   3.4 |    433.7 |             3.84 |              3.84
   32 |   0.4 |    434.1 |             0.44 |              0.44
   32 |  0.04 |   434.14 |             0.04 |              0.04
(9 rows)
 
test=#

有没有ORDER BY，都是一样的，因为ROWS是按物理分行的，而不是按逻辑分行的。

总结

ROWS：是按物理行来进行窗口级别里再次进行范围选择的。
RANGE：是按逻辑行来进行窗口级别里再次进行范围选择的。RANGE时，相同行会被合并成同一条数据再进行计算，相同行窗口计算时的结果也是相同的。
是否是相同行，是根据ORDER BY排序时的结果决定的。
有ORDER BY时：同行是说在ORDER BY排序时不唯一的行。【即具有相同数值的行】
             不同行是说ORDER BY排序时具有不同的数值的行。
没有ORDER BY：那么就是当前分区的所有行都包含在框架中，因为所有行都会成为当前行的相同行。【特别要注意最后一句的意思】

PostgreSQL DBA

Zhiyong Yang

PostgreSQL窗口函数中 ROWS 和 RANGE 模式的区别

示例表数据如下：

ROWS

RANGE

RANGE 和 ROWS 在PostgreSQL中的语法

RANGE时，请注意有没有 order by 的区别

没有ORDER BY，那么就是当前分区的所有行都包含在框架中，因为所有行都会成为当前行的相同行。

ROWS时，请注意有没有 order by 的区别

有没有ORDER BY，都是一样的，因为ROWS是按物理分行的，而不是按逻辑分行的。

总结

示例表数据如下：

ROWS

RANGE

RANGE 和 ROWS 在PostgreSQL中的语法

RANGE时，请注意有没有 order by 的区别

没有ORDER BY， 那么就是当前分区的所有行都包含在框架中，因为所有行都会成为当前行的相同行。

ROWS时，请注意有没有 order by 的区别

有没有ORDER BY，都是一样的，因为ROWS是按物理分行的，而不是按逻辑分行的。

总结

没有ORDER BY，那么就是当前分区的所有行都包含在框架中，因为所有行都会成为当前行的相同行。