I am using twitter stream function which gives a stream. I am required to use Spark writeStream function like:writeStream function link
// Write key-value data from a DataFrame to a specific Kafka topic specified in an option
val ds = df
.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.writeStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("topic", "topic1")
.start()
The 'df' needs to be a streaming Dataset/DataFrame. If df is a normal DataFrame, it will give error showing 'writeStream' can be called only on streaming Dataset/DataFrame;
I have already done: 1. get stream from twitter 2. filter and map it to get a tag for each twitt (Positive, Negative, Natural)
The last step is to groupBy tag and count for each and pass it to Kafka.
Do you guys have any idea how to transform a Dstream into a streaming Dataset/DataFrame?
Edited: ForeachRDD function does change Dstream to normal DataFrame. But 'writeStream' can be called only on streaming Dataset/DataFrame. (writeStream link is provided above)
org.apache.spark.sql.AnalysisException: 'writeStream' can be called only on streaming Dataset/DataFrame;