I have a couple of basic questions related to Spark Streaming
[Please let me know if these questions have been answered in other posts - I couldn't find any]:
(i) In Spark Streaming, is the number of partitions in an RDD by default equal to the number of workers?
(ii) In the Direct Approach for Spark-Kafka integration, the number of RDD partitions created is equal to the number of Kafka partitions.
Is it right to assume that each RDD partition i
would be mapped to the same worker node j
in every batch of the DStream
? ie, is the mapping of a partition to a worker node based solely on the index of the partition? For example, could partition 2 be assigned to worker 1 in one batch and worker 3 in another?
Thanks in advance