Spark Shuffle - How workers know where to pull data from

Question

I am trying to understand how does Spark shuffle dependencies under the hood. Thus I have two questions:

In Spark, how does an executor know from what other executors it has to pull data from?
- Does each executor, after finishing its map side task, update its status and location to some central entity ( may be driver) and reduce side executor first contact driver to get location of each executor to pull from and then pull from those executors directly?
In a job with shuffle dependency, does driver schedule joins (or other tasks on shuffle dependency) only after all map side tasks has finished?
- Does it mean that each task will notify driver about its status and driver will orchestrate other dependent tasks in timely manner.

score 9 · Accepted Answer · edited May 05 '19 at 12:29

I will answer your questions in points

1. How does an executor knows from what other executors it has to pull data from? Simply executor doesn't know what other executor do, But Driver know you can think this process as queen and worker the queen push the tasks to the executor and each one finish the task return back by the results.

2. Does each executor, after finishing its map side task, update its status and location to some central entity ( may be driver)

Yes, actually the driver monitor the process but When you create the SparkContext, each worker starts an executor. This is a separate process (JVM), and it loads your jar too. The executors connect back to your driver program. Now the driver can send them commands, like flatMap, map and reduceByKey in your example. When the driver quits, the executors shut down. you can check also look at this answer What is a task in Spark? How does the Spark worker execute the jar file?

3. Reduce side executor first contact driver to get location of each executor to pull from and then pull from those executors directly? Reduce task has the priority to be run on the same machine the data run on so, there will not be any shuffle unless the data is not available and there is no resources.

4. In a job with shuffle dependency, does driver schedule joins (or other tasks on shuffle dependency) only after all map side tasks has finished?

It is configurable you can change it. you can have a look for this link for more information https://0x0fff.com/spark-architecture-shuffle/

5. Does it mean that each task will notify driver about its status and driver will orchestrate other dependent tasks in timely manner?

Each task notifies and sent heartbeats to the driver and spark implement speculative execution technique. So, if any task fail/slow spark will run another one. more details here http://asyncified.io/2016/08/13/leveraging-spark-speculation-to-identify-and-re-schedule-slow-running-tasks/

Thanks Mostafa. Info is very helpful. One clarification i need on question#3 : Assuming there is a task which has shuffleRDD dependency. Per my understanding, Each Map task will write its output to **it's local disk**. Number of files will depend on number of reducers and partitions. When reducers are scheduled, they fetch this data **directly from mapper nodes**. So, reducers need to know network addresses of the mapper node and also name of files written for this reducer. How does reducers get this info? (Have gone through MapOutputTracker source code but could not get clear understanding). — Gopal, Feb 13 '17 at 13:42
**First**: Spark default configurations is not to write to the disk (It can be changed to be **MEMORY_AND_DISK OR DISK_ONLY OR MEMORY_ONLY**) but keep in your mind if it is disk performance will be very bad. **Second**: consider your case, actually driver is the main channel between map and reduce. So, reducer doesn't know any information unless driver inform, Also all configuration applied from driver not the reducer. Summary the map task finish then inform the driver about the task then driver fire the reducer to start based on your job configurations. Kindly mark my answer as true if it is. — Moustafa Mahmoud, Feb 15 '17 at 05:39

Spark Shuffle - How workers know where to pull data from

1 Answers1