Spark: driver/worker configuration. Does driver run on Master node?

Question

I am starting a spark cluster on AWS, with one master and 60 cores:

Here is the command to start up, basically 2 executors per core, totally 120 executors:

spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 120

However, in the job tracker, there is only 119 executors:

I thought there should be 1 driver + 120 worker executors. However, what I saw was 119 executors, which including 1 driver + 118 working executors.

Does that mean my Master node was not used? Is the driver running on the Master node or Core node? Can I make the driver run on the Master node and let the 60 Cores hosting 120 working executors?

Thanks!

Dimos · Answer 1 · 2018-03-30T15:32:48.747

By using the cluster-mode, the resource allocation has the structure shown in the following diagram.

I will attempt to provide an illustration of the calculations for the resources allocation as made by YARN. First of all, the specs of each of the core nodes are the following (you can confirm here):

memory: 244 GB
cores/vCPUs: 32

This means that you can run at maximum:

2 executors per core node, which is calculated based on the memory and cores requested. Specifically, available_cores / requested_cores = 32 / 13 = 2.46 -> 2 & available_mem / requested_mem = 244 / 90 = 2.71 = 2.
a single driver, without any more executors in a single core node. This is because when a driver runs in a core node, it leaves 244 - 180 = 64 GB of memory and 32 and 32-26 = 6 cores/vCPUS, which are not enough to run a separate executor.

So, from the existing pool of 60 core nodes, 1 node is used for the driver, leaving 59 remaining core nodes, which are running 59*2 = 118 executors.

Does that mean my Master node was not used?

If you mean whether the master node was used in order to execute the driver, then the answer is no. However, note that master was probably running a bunch of other applications in the meanwhile, which are out-of-scope in the context of this discussion (e.g. YARN resource manager, HDFS namenode etc.).

Is the driver running on the Master node or Core node?

The latter, the driver is running on the core node (since you used the --deploy-mode cluster parameter).

Can I make the driver run on the Master node and let the 60 Cores hosting 120 working executors?

Yes! The way to do that is to execute the same command but with --deploy-mode client (or leave that parameter unspecified, since at the time of writing this is used as default by Spark) in the master node.

By doing that, the resource allocation will have the structure shown in the following diagram.

Note that the Application Master will still consume some resources from the cluster ("stealing some resources from the executors). However, the AM resources are by default minimal, as can be seen here (spark.yarn.am.memory and spark.yarn.am.cores options), so it should not have a big impact.

if there are any benefits from the cluster-mode run vs client-mode? — jk1, Jun 28 '18 at 06:31
I think that warrants a separate question, since it includes several trade-offs. Briefly, it depends on whether you'll use "interactive" commands or just execute a job, the specs of your local machine compared to the cluster and the network proximity of your machine to the cluster. You can see this question for more: https://stackoverflow.com/questions/41124428/spark-yarn-cluster-vs-client-how-to-choose-which-one-to-use?rq=1 — Dimos, Jun 28 '18 at 08:41

score 6 · Answer 2 · answered Jan 21 '16 at 18:05

6

In yarn-cluster mode, the driver runs in the Application Master. This means that the same process is responsible for both driving the application and requesting resources from YARN, and this process runs inside a YARN container. The client that starts the app doesn’t need to stick around for its entire lifetime.

In yarn-client mode, Spark driver to run inside the client process that initiates the Spark application.

Have a look at cloudera blog for more details.

answered Jan 21 '16 at 18:05

Ravindra babu

37,698
11
250
211

Then in the cluster mode, is the Master node is part of the cluster? What does the Master node do after the job starts? Is it idling? – Edamame Jan 21 '16 at 18:34
Yes. In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. Client can go away and Master can take care of remaining tasks. – Ravindra babu Jan 21 '16 at 18:45
2

I mean the Master node instance: cr1.8xlarge, is the application master run in there? If so, why there is only 118 working executors instead of 120 working executors in the other 60 cr1.8xlarge instances? thanks! – Edamame Jan 21 '16 at 18:51
@Ravindrababu if there are any benefits from the cluster-mode run vs client-mode? – jk1 Jun 28 '18 at 06:31
@jk1 when you submit in client mode the driver runs when you are submitting the job. Suppose you are submitting from an edge node then driver will run in edge node while in cluster mode the driver will run in application master container launched by resource manager/yarn and it will be in one of the node of cluster – Bishnu Oct 21 '18 at 15:49

score 2 · Answer 3 · answered Jan 21 '16 at 17:46

2

When you're running yarn-cluster mode, the driver of the application runs within the cluster, rather than on the machine which you ran spark submit. This means that it will take up the number of driver cores on the cluster, resulting in the 119 executors that you see.

If you want to run your driver outside of the cluster, try yarn-client mode.

More details about running on YARN can be found here: http://spark.apache.org/docs/latest/running-on-yarn.html

answered Jan 21 '16 at 17:46

Hamel Kothari

717
4
11

1

Does this mean the Master node is not part of the cluster? Then what does the Master node do? – Edamame Jan 21 '16 at 18:32
The master in sparkstandalone (resource manager in YARN) is used only for delegating jobs out to the workers and keeping track of cluster health, etc. You could deploy a worker on the same box as the master, but if the master node is the only service, no executors will run there. Ravindra's post below has an accurate architecture diagram. – Hamel Kothari Jan 21 '16 at 19:04
If the node on which the spark driver application is running goes down or restarts, would the spark driver application be launched automatically on another worker node in the cluster in yarn cluster mode? – girip11 May 31 '17 at 17:32

Spark: driver/worker configuration. Does driver run on Master node?

3 Answers3