Spark - group several values to single

Question

I have Spark sequence of the values V1,V2,V3,V4,V5,.. How can i group this values to the (V1,V2,V3), (V4,V5,V6), (V7,V8,V9) ?
Values needs to be grouped in the arbitrary order, so i think that usage of the groupBy will be slightly overhead ( in terms of the performance ). Is there is another ways for doing this ?

score 0 · Answer 1 · edited May 23 '17 at 11:57

There is no easy way to do this unless you have already ensured the number of elems in each partition is divisible by the number you wish to group on. Lets assume it is, and is called nicelyPartitionedRdd, and the size of each group you want is n then

nicelyPartitionedRdd.mapPartitions(_.grouped(n))

would work. As for creating nicelyPartitionedRdd you could use this answer https://stackoverflow.com/a/25204589/1586965

Spark - group several values to single

1 Answers1