What is the relation between the number of NodeManager and the number of DataNodes so I can't have beyond physical memory bound exception for containers ?
-
Your question is not clear. What do you mean by: "so I can't have beyond physical memory bound exception for containers ?" – Manjunath Ballur Jul 14 '16 at 12:15
-
This is the error I encounter :Container is running beyond physical memory limits. Current usage: 170.2 MB of 170 MB physical memory used; 778.4 MB of 357.0 MB virtual memory used. Killing container. Container killed on request. Exit code is 143. I assume there is a mathematical relation between the number of Datanodes and the number of NodeManager so I can run an application without any error. – Marius Cristian Eseanu Jul 14 '16 at 17:05
1 Answers
Node Manager and Data Node correlation
There is 1:1 correlation between number of Node Managers and Data Nodes.
- Node Managers manage the containers requested by jobs
- Data Nodes manage the data
Hadoop is designed to ensure that compute (Node Managers) runs as close to data (Data Nodes) as possible. Usually containers for jobs are allocated on the same nodes where the data is present.
Hence in a typical Hadoop cluster, both Data Nodes and Node Manager run on the same machine.
Memory Issue:
Typically you face memory issues, when your Node Manager related settings in yarn-site.xml
are wrong.
To get the Node Manager settings right, you can check the answers provided in this link: MapReduce job hangs, waiting for AM container to be allocated.
Check the settings specified in yarn-site.xml
and mapred-site.xml
files.
To understand the tuning of YARN configuration, I found this to be a very good source: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html

- 1
- 1

- 6,287
- 3
- 37
- 48