I have the following code and I'm wondering why it generates only one batch:
df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "IP").option("subscribe", "Topic").option("startingOffsets","earliest").load()
// groupby on slidings windows
query = slidingWindowsDF.writeStream.queryName("bla").outputMode("complete").format("memory").start()
The application is launched with the following parameters:
spark.streaming.backpressure.initialRate 5
spark.streaming.backpressure.enabled True
The kafka topic contains around 11 million messages. I'm expecting that it should at least generate two batches due to the initialRate parameter, but it generates only one. Can anyone tell why spark is processing my code in only one batch?
I'm using Spark 2.2.1 and Kafka 1.0.