1

We run Spark 3.X on kubernetes, executor pods share the same readWriteMany Ceph volume.

So, all Spark workers write shuffle data on the same volume (I guess into different dirs), available for any worker.

On other side, Spark is sharing shuffle data over network.

How can I configure Spark to use local volume to get shuffle data from other worker rather than using TCP download?

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124

0 Answers0