Spark Structured Streaming - Stream Static Join trying to cache static data every microbatch

Asked Nov 21 '21 at 06:57

Active Dec 05 '21 at 21:36

Viewed 309 times

I am trying to perform stream-static join, my static table is less than 500 MB in size and i had cached it so that when the underlying table is refreshed it wont impact my stream-static join. I tried to check the DAG and i noticed every microbatch the .cache() step is being executed.

Is it true that in spark structured streaming that even, if we cache the static dataset, the microbatch is going to execute the step every microbatch ?

edited Dec 05 '21 at 21:36

Matthias J. Sax

59,682
7
117
137

asked Nov 21 '21 at 06:57

Sridhar Viswanathan

Can u show code pls – thebluephantom Nov 21 '21 at 09:06
https://stackoverflow.com/questions/66154867/stream-static-join-how-to-refresh-unpersist-persist-static-dataframe-periodic/66451431#66451431 – thebluephantom Nov 21 '21 at 09:39
Executed vs seeing if the data actualy changed that is processed? – thebluephantom Nov 21 '21 at 11:17

Spark Structured Streaming - Stream Static Join trying to cache static data every microbatch

0 Answers0