Spark shuffle service on local shared dir with Ceph on kubernetes

Asked Oct 13 '22 at 11:59

Active Oct 13 '22 at 11:59

Viewed 60 times

We run Spark 3.X on kubernetes, executor pods share the same readWriteMany Ceph volume.

So, all Spark workers write shuffle data on the same volume (I guess into different dirs), available for any worker.

On other side, Spark is sharing shuffle data over network.

How can I configure Spark to use local volume to get shuffle data from other worker rather than using TCP download?

asked Oct 13 '22 at 11:59

Thomas Decaux

0 Answers0