If I understand correctly, when a reduce task goes about gathering its input shuffle blocks ( from outputs of different map tasks ) it first keeps them in memory ( Q1 ). When the amount of shuffles-reserved memory of an executor ( before the change in memory management ( Q2 ) ) is exhausted, the in-memory data is "spilled" to disk. if spark.shuffle.spill.compress is true then that in-memory data is written to disk in a compressed fashion.
My questions:
Q0: Is my understanding correct?
Q1: Is the gathered data inside the reduce task always uncompressed?
Q2: How can I estimate the amount of executor memory available for gathering shuffle blocks?
Q3: I've seen the claim "shuffle spill happens when your dataset cannot fit in memory", but to my understanding as long as the shuffle-reserved executor memory is big enough to contain all the ( uncompressed ) shuffle input blocks of all its ACTIVE tasks, then no spill should occur, is that correct?
If so, to avoid spills one needs to make sure that the ( uncompressed ) data which ends up in all parallel reduce-side tasks is less than the executor's shuffle-reserved memory part?