I have Spark sequence of the values V1,V2,V3,V4,V5,.. How can i group this values to the (V1,V2,V3), (V4,V5,V6), (V7,V8,V9) ?
Values needs to be grouped in the arbitrary order, so i think that usage of the groupBy will be slightly overhead ( in terms of the performance ). Is there is another ways for doing this ?
Asked
Active
Viewed 265 times
1

user261706
- 303
- 2
- 4
1 Answers
0
There is no easy way to do this unless you have already ensured the number of elems in each partition is divisible by the number you wish to group on. Lets assume it is, and is called nicelyPartitionedRdd
, and the size of each group you want is n
then
nicelyPartitionedRdd.mapPartitions(_.grouped(n))
would work. As for creating nicelyPartitionedRdd
you could use this answer https://stackoverflow.com/a/25204589/1586965

Community
- 1
- 1

samthebest
- 30,803
- 25
- 102
- 142