Consider the following intended sql:
select row_number()
over (partition by Origin order by OnTimeDepPct desc) OnTimeDepRank,*
from flights
This will not work in structured streaming - as shown in the following question Spark - Non-time-based windows are not supported on streaming DataFrames/Datasets; by my own answer to that question: https://stackoverflow.com/a/55777253/1056563
The culprit is:
partition by Origin
The requirement is to use a timestamp-typed field such as
partition by flightTime
This requirement comes from a definitive source (core committer for spark streaming) - specifying that timestamp based aggregations are supported. The syntax in that case is using window
:
window("timestamp", "10 minutes")`
There is actually an additional complication: Structured Streaming does not support correlated subqueries. Therefore the generally useful answers from the esteemed Gordon Linoff here: https://stackoverflow.com/a/46856508/1056563 can not be applied
What then for my query above - which must be based on the Origin field? What is the closest equivalent to that query? Or what would be a workaround or different approach to achieve same results?