We have a requirement where two incoming DataSet/DataFrame needs to go through multiple operations (like join, groupby, etc) to reach to a final state.
For example, the incoming dataframes are df1
and df2
:
df3 = df1.groupby("key")
df4 = df3.join(df2)
...
Lets say that finally df7
is the dataframe that I need to send to writeStream
.
Questions:
- Is there a way to achieve this in Structured Streaming?
- What is the major reason to not support this in straightforward manner?
PS: I came across this question and a possible solution using flatMapGroupWithState
: Multiple aggregations in Spark Structured Streaming.
Can you please give an example how can the above scenario be done using flatMapGroupWithState
for my first question and my second question is not part of the link above.
Thanks in advance