I'm using spark-streaming for below use-case :
I've a kafka topic - data. From this topic, I'm streaming in real-time data using structured spark streaming and apply some filters on it. If the number of rows returned after applying the filters is greater than 1 then the output is 1 else the output is 0 along with some other data from the query.
In simple words, suppose I'm filtering the stream using -
df.filter($A < 10)
where "A", "<" and "10" are dynamic and comes from some database. In actual, these values comes from kafka topic which I'm consuming and updating those values in db. So the query is not static and will be updated after sometime.
Further, I'll have to apply some boolean algeric operators on the results of streams. For eg -
df.filter($A < 10) AND df.filter($B = 1) OR df.filter($C > 1)... and so on
Here, each of the atomic operation (like df.filter($A < 10)) returns either 0 or 1 as described above. Final result is saved to mongo.
I want to know if both problems can be used using structured spark streaming. If not, then using RDD ?
Otherwise, can someone suggest any way to do this ?