In my use case , I need to do multiple aggregations in spark structured streaming. Though directly it is not supported so far till 2.4.x but have seen this thread(Multiple aggregations in Spark Structured Streaming)
So far in my understanding, we have two options to achieve that :
I do my first aggregation and then store that result using either "foreach" or "foreachbatch" in some temporary store and then read it again to do second aggregation. This step will involve writing to some external storage/memory and may not be very efficient.
Second option as mentioned in thread is(Multiple aggregations in Spark Structured Streaming) it to use "flatMapGroupWithState". This looks promising to solve it but not sure about its performance implication, as writing this method may involve shuffling(not sure whether we can optimize shuffling in this method) .
What is the best option out of these two to achieve multiple aggregation in spark structured streaming specially in terms of performance ?