1

The groupByKey gives me RDD[key,value] And I could not find any way to convert this to Map[key,RDD[values]] . Thanks.

SV

  • Possible duplicate of [How to split a RDD into two or more RDDs?](http://stackoverflow.com/questions/32970709/how-to-split-a-rdd-into-two-or-more-rdds) –  Nov 03 '16 at 15:12

1 Answers1

1

AFAIK there's no Spark primitive that would allow you to split an RDD by key like that. We are using filtering to achieve a similar result. And performance-wise it has to be a lot lighter than an actual groupByKey, because filter does not require a shuffle.

val keys = rdd.keys.collect
val dataByKey = keys.map(key => (key, rdd.filter(_._1 == key)).toMap

Note that the keys have to fit in the memory of the driver for this to work.

maasg
  • 37,100
  • 11
  • 88
  • 115