15

Am I understanding the documentation for client mode correctly?

  1. client mode is opposed to cluster mode where the driver runs within the application master?
  2. In client mode the driver and application master are separate processes and therefore spark.driver.memory + spark.yarn.am.memory must be less than the machine's memory?
  3. In client mode is the driver memory is not included in the application master memory setting?
mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
user782220
  • 10,677
  • 21
  • 72
  • 135
  • Hi, if any of the answer has solved your problem please consider [accepting it](http://meta.stackexchange.com/q/5234/179419) or add your own solution. So, that it indicates to the wider community that you've found a solution. – mrsrinivas May 28 '18 at 06:28

2 Answers2

23

client mode is opposed to cluster mode where the driver runs within the application master?

Yes, When Spark application deployed over YARN in

  • Client mode, driver will be running in the machine where application got submitted and the machine has to be available in the network till the application completes.
  • Cluster mode, driver will be running in application master(one per spark application) node and machine submitting the application need not to be in network after submission

Client mode

Client mode

Cluster mode

Cluster mode

If Spark application is submitted with cluster mode on its own resource manager(standalone) then the driver process will be in one of the worker nodes.

References for images and content:

In client mode the driver and application master are separate processes and therefore spark.driver.memory + spark.yarn.am.memory must be less than the machine's memory?

No, In client mode, driver and AM are separate processes and exists in different machines, so memory need not to be combined but spark.yarn.am.memory + some overhead should be less then YARN container memory(yarn.nodemanager.resource.memory-mb). If it exceeds YARN's Resource Manager will kill the container.

In client mode is the driver memory is not included in the application master memory setting?

Here spark.driver.memory must be less then the available memory in the machine from where the spark application is going to launch.

But, In cluster mode use spark.driver.memory instead of spark.yarn.am.memory.

spark.yarn.am.memory : 512m (default)

Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m, 2g). In cluster mode, use spark.driver.memory instead. Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively.

Check more about these properties here

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
  • Great response. To clarify, the application master requests resources from the Resource Manager. But does the application master itself create the Yarn Containers and the Spark executors? – vi_ral May 31 '20 at 19:50
  • Thank you. This will be handled at YARN, Application master inform/request RM based on data size, and configurations to get the Yarn Containers to run Spark executors inside. – mrsrinivas Jun 01 '20 at 09:58
4

In client mode, the driver is launched directly within the spark-submit i.e client program. The application master to be created in any one of node in cluster. The spark.driver.memory (+ memory overhead) to be less than machine's memory.

In cluster mode, driver is running inside the application master in any of node in the cluster.

https://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

Ravikumar
  • 1,121
  • 1
  • 12
  • 23