Understanding the shuffle in spark

Question

Shuffling in spark is (as per my understanding):

Identify the partition that the records have to go to (Hashing and modulo)
Serialize the data that needs to go to the same partition
transmit the data
The data gets deserialized and read by the executors on the other end

I have a question about this:

How is the data transmitted between the executors? Even if we have the space available in Memory. Let us assume our execution memories are 50GiB per executor and the entire data to be shuffled is just 100 MB. Is the data transmission from Storage memory (exec 1) to Storage memory (exec 2) or are there disk writes involved as intermediate steps?

score 2 · Accepted Answer · answered Dec 13 '22 at 14:51

2

Spark shuffle outputs are always written to disk.

Why ? because simply you cannot send data from an executor memory to another executor memory directly, it has to be written locally than loaded into the executor memory, that's why you have serialization deserialization during shuffling, that's why having a quality disks (ssd) is also important for spark.

from blog.scottlogic.com

During a shuffle, data is written to disk and transferred across the network, halting Spark’s ability to do processing in-memory and causing a performance bottleneck.

answered Dec 13 '22 at 14:51

Abdennacer Lachiheb

4,388
7
30
61

1

Can you please also tell me if the fetched data is also written to disk. i.e. on the reducer side? More concretely, is the flow .. map --> disk --> executor (reducer side) --> disk? – figs_and_nuts Dec 27 '22 at 11:25
But serialising doesn't imply that you need to write to disk. – Denziloe Jul 15 '23 at 01:40

Understanding the shuffle in spark

1 Answers1