What is a container in YARN? Is it same as the child JVM in which the tasks on the nodemanager run or is it different?
9 Answers
It represents a resource (memory) on a single node at a given cluster.
A container is
- supervised by the node manager
- scheduled by the resource manager
One MR task runs in such container(s).

- 10,630
- 1
- 38
- 45
-
5A MR task does not run in such a container. It runs on a set of containers, as each map or reduce function runs on one container. A task could run in *uber* mode on one container, but a task usually spans hundreds or thousands containers by the `MRAppMaster`. Also, a container described by a rich resource vector and does not represent exclusively memory. – Dyin Nov 26 '14 at 15:55
-
1Thanks for pointing this out, you're right,I updated the answer. However, when I answered this question more or less 2 years ago, a container only represented a memory resource. – Lorand Bendig Dec 18 '14 at 07:47
-
1what's the relationship between containers and the executors? is each executor running in one container? Thanks! – lucian Feb 02 '15 at 02:44
-
12Actually the original definition was correct. A MR *job* comprises a set of tasks, each task running in one container. – marcorossi May 10 '15 at 23:02
There can be multiple containers on a single Node (or a single very big one).
Every node in the system is considered to be composed of multiple containers of minimum size of memory (say 512MB or 1 GB). The ApplicationMaster can request any container as a multiple of the minimum memory size.
Source, see section ResourceManager/Resource Model.
-
AFAIK, ApplicationMaster can request any size but Yarn Scheduler only allocates as multiples of the minimum memory size defined in yarn.scheduler.minimum* class of properties. – ᐅdevrimbaris Dec 27 '16 at 14:28
Word 'Container' is used in YARN in two contexts,
Container: Signifies an allocated resources to an ApplicationMaster. ResourceManager is responsible for issuing resource/container to an ApplicationMaster. Check Container API.
Launching a Container: Based on allocated resources (containers) ApplicationMaster request NodeManager to start Containers, resulting in executing task on a node. Check ContainerManager API.

- 1,128
- 2
- 18
- 32

- 379
- 3
- 5
In Hadoop 2.x, Container is a place where a unit of work occurs. For instance each MapReduce task(not the entire job) runs in one container.
An application/job will run on one or more containers.
Set of system resources are allocated for each container, currently CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.
In Hadoop 1.x a slot is allocated by the JobTracker to run each MapReduce task. Then the TaskTracker spawns a separate JVM for each task(unless JVM reuse is not enabled).

- 3,902
- 8
- 36
- 46
In simple terms, Container is a place where a YARN application is run. It is available in each node. Application Master negotiates container with the scheduler(one of the component of Resource Manager). Containers are launched by Node Manager.

- 1,162
- 1
- 12
- 9
According to the size of input data, multiple input splits are created. The MR job need to process this whole data so multiple tasks are being created(map & reduce tasks). So for each input split will be processed by one task. Now how to run this task, is suggested by Resource manager. Resource manager knows which node manager is free and which is busy, its like principal of college and node manager are the class teacher of college and principal knows which teacher is free. So it asks node manager to run that task(small fraction of entire job) in the container i.e. memory area such that jvm. So the job is run as an application master inside the container.

- 75
- 9
The Container is the resource allocation, which is the successful result of the ResourceManager granting a specific ResourceRequest. A Container grants rights to an application to use a specific amount of resources (memory, cpu etc.) on a specific host.

- 340
- 2
- 10
- 43

- 802
- 5
- 19
Container :
The logical lease on resources and the actual process spawned on the node is used interchangeably. It is same process in which tasks(or AM) runs. To start container we provide container object and CLC (ContainerLaunchContext) in which we set list of commands to run tasks (or AM).
nmClient.startContainer(container, clcObj)
ContainerLaunchContext code snippet :
<code>
.
.
.
/**
* Add the list of <em>commands</em> for launching the container. All
* pre-existing List entries are cleared before adding the new List
* @param commands the list of <em>commands</em> for launching the container
*/
@Public
@Stable
public abstract void setCommands(List<String> commands);
</code>

- 19
- 2
Container is a place where the application runs its task. If you want to know the total no.of running containers in a cluster, then you could check in your cluster Yarn-Resource manager UI.
Yarn URL: http://Your-Active-ResourceManager-IP:45020/cluster/apps/RUNNING
At the "Running containers" column, the total no. of running containers details is present.
Note: If you are using spark, then the spark executors would be running inside the container. One container can accommodate multiple spark executors.

- 21
- 3