0

We have a requirement where two incoming DataSet/DataFrame needs to go through multiple operations (like join, groupby, etc) to reach to a final state.

For example, the incoming dataframes are df1 and df2:

df3 = df1.groupby("key")
df4 = df3.join(df2)
...

Lets say that finally df7 is the dataframe that I need to send to writeStream.

Questions:

  1. Is there a way to achieve this in Structured Streaming?
  2. What is the major reason to not support this in straightforward manner?

PS: I came across this question and a possible solution using flatMapGroupWithState: Multiple aggregations in Spark Structured Streaming. Can you please give an example how can the above scenario be done using flatMapGroupWithState for my first question and my second question is not part of the link above.

Thanks in advance

Shaido
  • 27,497
  • 23
  • 70
  • 73
Vindhya G
  • 1,339
  • 2
  • 21
  • 46
  • A join is allowed, so I cannot follow it really. – thebluephantom May 04 '20 at 09:23
  • We have multiple joins and groupby . Basically multiple transformations on a dataframe. I just added ... at the end indicating that. – Vindhya G May 04 '20 at 10:09
  • Unclear what you are asking as some things are allowed in some output modes and not others. For 15 pts a heck of a lot. – thebluephantom May 04 '20 at 10:10
  • In both output or append mode right now a dataframe is not allowed to have multiple aggregations if you refer the link i attached – Vindhya G May 04 '20 at 10:12
  • what i want is to do chain of transformations of a dataframe and then send it to sink with update or append mode – Vindhya G May 04 '20 at 10:13
  • Clear, so why ask a question on something that is not allowed as per the latest Sp Str Str manual? The title to your question should change if you want flatMapG...otherwise the answer is simply not allowed – thebluephantom May 04 '20 at 10:20
  • important question is the reason behind not allowin it. to understand better . thats the question title . i want to understand why its not allowed as it was allowed in Spark streaming. I have asked the 2 questions clearly in the body too – Vindhya G May 04 '20 at 10:23

0 Answers0