Is there any method to split Spark partition without through network and shuffle, for example
# p stands for partition
machine 1:
p1: 1,2 p2: 3,4
machine 2:
p3: 5,6 p4: 7,8
what I want to have is
machine 1:
p1:1, p2:2, p3:3, p4:4
machine 2:
p5:5, p6:6, p7:7, p8:8
Is there any way to do this? (I think no network transmit and shuffle here)
PS:
This is the reverse of coalesce
, if I call coalesce(2)
then I suppose it would be
machine 1: p1: 1,2,3,4 machine 2: p2: 5,6,7,8
where data does not go through network and no shuffle would be called, and coalesce(1)
will cause network transmit because data in machine 2 all goes to machine 1?