10

Hi we have recently upgraded to yarn from mr1. I know that container is an abstract notion but I don't understand how many jvm task (map, reduce, filter etc) one container can spawn or other way to ask is is container reusable across mutltiple map or reduce tasks. I read in following blog : What is a container in YARN?

"each mapper and reducer runs on its own container to be accurate!" which means if I look at AM logs I should see number of container allocated equal to number of map tasks (failed|success) plus number of reduce task is that correct?

I know number of containers changes during Application life cycle, based on AM requests, splits, scheduler etc.

But is there a way to request initial number of minimum container for given application. I think one way is to configure fair-scheduler queue. But is there anything else that can dictate this?

In case of MR if I have mapreduce.map.memory.mb = 3gb and mapreduce.map.cpu.vcores=4. I also have yarn.scheduler.minimum-allocation-mb = 1024m and yarn.scheduler.minimum-allocation-vcores = 1.

Does that mean I will get one container with 4 cores or 4 containers with one core?

Also its not clear where can you specify mapreduce.map.memory.mb and mapreduce.map.cpu.vcores. Should they be set in client node or can they be set per application as well?

Also from RM UI or AM UI is there a way to see currently assigned containers for given application?

Community
  • 1
  • 1
nir
  • 3,743
  • 4
  • 39
  • 63

1 Answers1

12
  1. Container is a logical entity. It grants an application to use specific amount of resources (memory, CPU etc.) on a specific host (Node Manager). A container can not be re-used across map and reduce tasks for the same application.

For e.g. I have a Mapreduce application, which spawns 10 mappers: Number of mappers

I am running this on a single host with 8 vCores (this value is determined by the configuration parameter: yarn.nodemanager.resource.cpu-vcores). By default, this is set to 8. Please check "YarnConfiguration.java"

  /** Number of Virtual CPU Cores which can be allocated for containers.*/
  public static final String NM_VCORES = NM_PREFIX + "resource.cpu-vcores";
  public static final int DEFAULT_NM_VCORES = 8;

Since there are 10 mappers and 1 Application master, total number of containers spawned is 11. enter image description here

So, for each map/reduce task a different container gets launched.

But, in Yarn, for MapReduce jobs, there is a concept of a Uber job, which enables the user to use a single container for multiple mappers and 1 reducer (https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml: CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE and will ignore larger values.).

  1. There is no configuration parameter available to specify the minimum number of the containers. It is the responsibility of the Application Master to request the number of containers needed.

  2. yarn.scheduler.minimum-allocation-mb - Determines the minimum allocation of memory for each container (yarn.scheduler.maximum-allocation-mb determines the maximum allocation for every container request)

    yarn.scheduler.minimum-allocation-vcores - Determines the minumum allocation of vCores for each container (yarn.scheduler.maximum-allocation-vcores determines the maximum allocation for every container request)

    In your case, you are requesting "mapreduce.map.memory.mb = 3m (3MB) and mapreduce.map.cpu.vcores = 4 (4 vCores).

    So, you will get 1 container with 4 vCores for each mapper (assuming yarn.scheduler.maximum-allocation-vcores is >= 4)

  3. The parameters "mapreduce.map.memory.mb" and "mapreduce.map.cpu.vcores" are set in the mapred-site.xml file. If this configuration parameter is not "final", then it can be overridden in the client, before submitting the job.

  4. Yes. From the "Application Attempt" page for the application, you can see the number of allocated containers. Check the attached figure above.

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
  • 2. How does Application Master request number of containers needed. For example in spark yarn-client mode I know it's defined by spark.executor.instances but how does that translate into something that is understandable by yarn. Simply question is what yarn properties allows application master to set number of containers. I see following in Hadoop AM example. `rsrcRequest.setNumContainers(numContainers);` can you confirm its been set programmatically not via any parameter – nir Oct 09 '15 at 16:25
  • 5. Is "Application Attempt" page is a new feature? I don't see it in 2.5.1 – nir Oct 09 '15 at 16:25
  • 1
    You are referring to example here: https://hadoop.apache.org/docs/r0.23.11/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html. This example talks about writing an Application Master. An AM has to decide how many containers it needs and then calls RM with resource request. While requesting resources, it asks for containers by calling "setNumContainers" method. My answer for 2nd point meant the same. – Manjunath Ballur Oct 09 '15 at 16:39
  • 1
    For point 5. I am sorry, I did not know your Hadoop version. I am using Hadoop 2.7.1. From the main YARN RM UI: When you click on application ID, It takes you to application's page. In the application page, at the bottom, it shows all the attempts. When you click on an attempt ID, it takes you to attempt page. In the attempt page, it shows total number of allocated containers and logs for each of the containers. – Manjunath Ballur Oct 09 '15 at 16:41
  • Thanks for clarification. Yes I did understood your point 2. I just wanted to know what mechanism AM has to use to do that. – nir Oct 09 '15 at 16:42
  • For point 2, this Cloudera blog gives some idea: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_yarn_tuning.html. For mapreduce jobs, the number of containers will be equal to number of mappers + number of reducers. After calculating the number of mappers (for e.g. based on split size) and reducers, the AM requests for those many number of containers from RM, by calling "setNumContainer()" method. – Manjunath Ballur Oct 09 '15 at 16:49