What are Apache Spark Job ,Task and Stage and what is difference between Job,Task and Stage ?
-
Possible duplicate of [What is a task in Spark? How does the Spark worker execute the jar file?](https://stackoverflow.com/questions/25276409/what-is-a-task-in-spark-how-does-the-spark-worker-execute-the-jar-file) – Alper t. Turker Jan 12 '18 at 14:34
1 Answers
A stage is a physical unit of execution. It is a step in a physical execution plan. A stage is a set of parallel tasks - one task per partition (of an RDD that computes partial results of a function executed as part of a Spark job).
A Job is a parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save, collect); you'll see this term used in the driver's logs.
A task is a command sent from the driver to an executor by serializing your Function object. The executor deserializes the command (this is possible because it has loaded your jar), and executes it on a partition.
For more information about how they works kindly visit this link. https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-DAGScheduler-Stage.html

- 3,418
- 4
- 20
- 41