I have a use case where I have a streaming job running getting input data from kafka queue. And I have a reference data of 1 million rows which gets updated every hour. I load the reference data in the driver and then broadcast it to the workers. I would like to update this broadcast variable (in the driver) and resend it to workers.
What would be the best way to do this within spark, without introducing hbase/redis/cassandra etc?
And how reliable is this?
Do let me know if more information is needed. Thank you in advance. =)