|
运行以下代码:
! m1 ]- s i* h; g" vval sales = Seq(3 d$ @ p( f) @6 ?2 W0 F# h
(0, 0, 0, 5),
; N/ |* H' u' B9 h4 U (1, 0, 1, 3),
% u% y, @8 j' S (2, 0, 2, 1),
0 a. K" A1 m) c" q9 F (3, 1, 0, 2),, r; }5 T+ X; f2 Y
(4, 2, 0, 8),+ p7 B& |) h1 d- Y8 U. G
(5, 2, 2, 8))0 J% |% O% _7 v* [2 l |, i
.toDF("id", "orderID", "prodID", "orderQty")" `5 V* Z6 O% g( z% z1 ^+ g+ _
val orderedByID = Window.orderBy('id)3 s8 B1 I' Q' j& ^
val totalQty = sum('orderQty).over(orderedByID).as('running_total)* R5 a1 T- t' O5 Q
val salesTotalQty = sales.select('*, totalQty).orderBy('id)
0 c7 j: ?! U4 csalesTotalQty.show
B5 h5 z& c" @6 ~2 t9 M# |4 y$ j结果是:
w6 a1 G' ]4 @* g$ y$ n+---+-------+------+--------+-------------+* A" k) \# |4 q! a
| id|orderID|prodID|orderQty|running_total|8 ]/ o/ m3 N2 P
+---+-------+------+--------+-------------+
- W& p( q1 |+ e+ B; o4 v4 e| 0| 0| 0| 5| 5|) P& b: d+ j% P, r* B
| 1| 0| 1| 3| 8|# R; r+ C3 ?$ m" D1 {
| 2| 0| 2| 1| 9|
! }" {& b: L( G# \2 B- R3 o8 ^| 3| 1| 0| 2| 11|( D6 a( \8 I- W' U* v# h+ R
| 4| 2| 0| 8| 19|/ i" A- e' B8 w. \. n/ S9 f- O
| 5| 2| 2| 8| 27|, P; _& b6 b" u
+---+-------+------+--------+-------------+
$ L. M7 s+ K+ Z% S1 z上面的代码中没有定义任何窗口框架,它看起来默认的窗口框架是 rowsBetween(Window.unboundedPreceding,0 }( M% }( l J4 @; d
Window.currentRow)
* S. C9 i, M" t8 w/ W) r, [不确定我对默认窗口框架的理解是否正确$ U5 Z$ A7 N' X2 T# Q0 m+ q
4 l' G# _8 w$ T$ Y; X
解决方案:5 |# S1 p( T" n3 z* S
$ w- b/ y2 w- y2 }0 J
5 @6 A; Q" F3 J7 i
, A9 B n2 W; m$ R; b- ]2 W 从Spark Gotchas% E1 h1 x1 T# F: a7 J9 L
( ^$ |& ~) R" p' a4 j
默认帧规格取决于给定窗口定义的其他方面:2 q* h8 v- S+ A6 c. R
如果指定了ORDER BY子句,并且该函数接受了帧规范,则该帧规范是由RANGE BETWEEN UNBOUNDED PRECEDING AND# @$ M0 F( G8 j( c; u9 B
CURRENT ROW定义的,否则,帧规格由未绑定的前导和未绑定的后继之间的行定义。2 n7 g8 w. p" [: ~4 j# S; o' \ S
# X, i; P( f/ |8 q+ J9 e3 ~4 P |
|