11

what is the relationship between spark executor and yarn container when using spark on yarn?
For example, when I set executor-memory = 20G and yarn container memory = 10G, does 1 executor contains 2 containers?

no123ff
  • 307
  • 5
  • 16

3 Answers3

8

Spark Executor Runs within a Yarn Container. A Yarn Container is provided by Resource Manager on demand. A Yarn container can have 1 or more Spark Executors. Spark-Executors are the one which runs the Tasks. Spark Executor will be started on a Worker Node(DataNode)

In your case when you set executor-memory = 20G -> This means you are asking for a Container of size 20GB in which your Executors will be running. Now you might have 1 or more Executors using this 20GB of Memory and this is Per Worker Node.

So for example if u have a Cluster to 8 nodes, it will be 8 * 20 GB of Total Memory for your Job.

Below are the 3 config options available in yarn-site.xml with which you can play around and see the differences.

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
AJm
  • 993
  • 2
  • 20
  • 39
  • Great answer excep when you say ‘ per worker node’. I believe there can be multiple containers within a worker node. And then, as you said, each such container can have multiple executors sharing same memory. – Gadam May 24 '21 at 06:39
  • Also in this yarn context, is there a concept of a worker process, since we already have a co tai we? – Gadam May 24 '21 at 07:01
2

When running Spark on YARN, each Spark executor runs as a YARN container, This means the number of containers will always be the same as the executors created by a Spark application e.g. via --num-executors parameter in spark-submit.

https://stackoverflow.com/a/38348175/9605741

S.Bao
  • 191
  • 1
  • 6
1

In YARN mode, each executor runs in one container. The number of executors is the same as the number of containers allocated from YARN(except in cluster mode, which will allocate another container to run the driver).

BrightFlow
  • 1,294
  • 8
  • 13
  • if I set --executor-memory 20G in spark,and set memory allocate to container to 10G in yarn ,what will happen ? 1 spark executor will get 10G memory and contains only 1 container? – no123ff Feb 06 '18 at 05:46
  • If you set the maximum memory for the container to 10G, the job will not run, and receives error like `Required executor memory (XXX MB) is above the max threshold (YYY MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'` – BrightFlow Feb 06 '18 at 06:27