I have gone through this stackoverflow question, as per the answer it creates a DStream
with only one RDD
for the batch interval.
For example:
My batch interval is 1 minute and Spark Streaming job is consuming data from Kafka Topic.
My question is, does the RDD available in DStream pulls/contains the entire data for the last one minute? Is there any criteria or options we need to set to pull all the data created for the last one minute?
If i have a Kafka topic with 3 partitions, and all the 3 partitions contains the data for the last one minute, will the DStream pulls/contains all the data created for the last one minute in all the Kafka topic partitions?
Update:
In Which case DStream contains more than one RDD?