In YARN, how is the container size determined?

Question

In YARN application, how does ApplicationMaster decide on the size of the container? I understand there are parameters controlling on the minimum memory allocation, vcores ratio etc. But how does application master understand that it needs so much amount of memory and so many CPUs for a particular job - either MapReduce / Spark?

I have answered a similar question here, for MapReduce jobs: http://stackoverflow.com/questions/33004487/yarn-container-understanding-and-tuning/33038730#33038730 — Manjunath Ballur, Dec 21 '15 at 08:46
Thanks @ManjunathBallur for your answer. This is my understanding after reading your comments and response in the other thread. Depending on the data size to be processed, number of mappers would be decided (1 mapper per input split) and number of reducers would be provided programatically. Lets say these are m and r respectively. So, we will have m+r containers requested by the AM to the RM. Resouce size of each of the container will be decided by the parameters mentioned in your post i.e. both memory and vcores. Is my understanding correct? — greenhorntechie, Dec 21 '15 at 09:33
Exactly. For mapreducer jobs, the number of containers will be equal to number of mappers + number of reducers. Only in case of Uber jobs, the containers could be re-used for the mappers. And the amount of memory and number of vCores needed for mappers and reducers is decided based on the settings in mapre-site.xml and yarn-site.xml — Manjunath Ballur, Dec 21 '15 at 09:44

Rahul Sharma · Answer 1 · 2019-04-03T07:28:18.693

First let me explain in one or two lines how YARN works then we go through the questions.

So let's assume we have 100GB of total YARN cluster memory and 1GB minimum-allocation-mb, then we have 100 max containers. If we set the minimum allocation to 4GB, then we have 25 max containers.

Each application will get the memory it asks for rounded up to the next container size. So if the minimum is 4GB and you ask for 4.5GB you will get 8GB.

If the job/task Memory requirement is bigger than the allocated container size, in which case it will shoot down this container.

Now coming back to your original question, how YARN application master decide how much amount of Memory and CPU is required for a particular job.

YARN Resource Manager (RM) allocates resources to the application through logical queues which include memory, CPU, and disks resources.

By default, the RM will allow up to 8192MB ("yarn.scheduler.maximum-allocation-mb") to an Application Master (AM) container allocation request.

The default minimum allocation is 1024MB ("yarn.scheduler.minimum-allocation-mb").

The AM can only request resources from the RM that are in increments of ("yarn.scheduler.minimum-allocation-mb") and do not exceed ("yarn.scheduler.maximum-allocation-mb").

The AM is responsible for rounding off ("mapreduce.map.memory.mb") and ("mapreduce.reduce.memory.mb") to a value divisible by the ("yarn.scheduler.minimum-allocation-mb").

RM will deny an allocation greater than 8192MB and a value not divisible by 1024MB.

Following YARN and Map-Reduce parameters need to set to change the default Memory requirement:-

For YARN

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.vmem-pmem-ratio
yarn.nodemanager.resource.memory.mb

For MapReduce

mapreduce.map.java.opts
mapreduce.map.memory.mb
mapreduce.reduce.java.opts
mapreduce.reduce.memory.mb

So conclusion is that, application master doesn't use any logic to calculate resources (memory/CPU) requirement for a particular job. It simply use above mentioned parameters value for it. If any jobs doesn't complete in given container size (including virtual Memory), then node manager simply kill the container.

In YARN, how is the container size determined?

1 Answers1