In a Spark Streaming application there is an external datasource(relational database) that I need to query every 10 minutes and make the results available for my stream processing pipeline.
I don't quite understand what is the right way of doing it.
Accumulators
are append only (as described in the documentation) but I found this:
/**
* Set the accumulator's value; only allowed on master
*/
def setValue(newValue: R) {
this.value = newValue
}
and Broadcast variables
are only write-once
https://spark.apache.org/docs/1.6.2/programming-guide.html#broadcast-variables
The scheduling aspect is also not clear to me.
Is there a way to make updated result set available for the stream processing logic?
PS It seems to be very similar to what I need How can I update a broadcast variable in spark streaming?