I'm creating a Spark Structured streaming application which is going to be calculating data received from Kafka every 10 seconds.
To be able to do some of the calculations, I need to look up some information about sensors and placement in a Cassandra Database
I'm a little stuck at wrapping my head around how to keep the Cassandra data available throughout the cluster, and somehow update data from time to time, in-case we have done some changes to database table.
Currently, I'm querying the database as soon as I start the Spark locally using the Datastax Spark-Cassandra-connector
val cassandraSensorDf = spark
.read
.cassandraFormat("specifications", "sensors")
.load
From here on I can use this cassandraSensorDs
by joining it with my Structured Streaming Dataset.
.join(
cassandraSensorDs ,
sensorStateDf("plantKey") <=> cassandraSensorDf ("cassandraPlantKey")
)
How do I do additional queries to update this Cassandra data while having Structured Streaming Running? And how can I make the queried data available in a cluster setting?