The groupByKey gives me RDD[key,value] And I could not find any way to convert this to Map[key,RDD[values]] . Thanks.
SV
The groupByKey gives me RDD[key,value] And I could not find any way to convert this to Map[key,RDD[values]] . Thanks.
SV
AFAIK there's no Spark primitive that would allow you to split an RDD by key like that. We are using filtering to achieve a similar result. And performance-wise it has to be a lot lighter than an actual groupByKey
, because filter does not require a shuffle.
val keys = rdd.keys.collect
val dataByKey = keys.map(key => (key, rdd.filter(_._1 == key)).toMap
Note that the keys have to fit in the memory of the driver for this to work.