1

I hear there is a way to add 32 cores or which ever you have for cores to 1 container in Hadoop 2.7 yarn.

Would this be possible and does someone have a sample configuration of what I need to change to achieve this?

The test would be terasort, adding my 40 cores to 1 container job.

Chad S.
  • 6,252
  • 15
  • 25
Queasy
  • 131
  • 11

2 Answers2

2

For vCores following are the configurations:

yarn.scheduler.maximum-allocation-vcores - Specifies maximum allocation of vCores for every container request

Typically in yarn-site.xml, you set this value to 32. I think, any value greater than 32 will be rejected by YARN.

  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>32</value>
  </property>

If this value is not set, then YARN RM takes the default value, which is "4"

public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;

If you are running a MapReduce application, then you also need to set two more configuration parameters, in mapred-site.xml:

  • mapreduce.map.cpu.vcores - The number of vCores to request from the scheduler for map tasks
  • mapreduce.reduce.cpu.vcores - The number of vCores to request from the scheduler for the reduce tasks

The resource calculation for your mapper/reducer requests is done in the scheduler code. If you want your scheduler to consider both memory and CPUs for resource calculation, then you need to use "DominantResourceCalculator" (which considers both CPU and memory for resource calculation)

For e.g. if you are using Capacity Scheduler, then you need to specify following parameter in "capacity-scheduler.xml" file:

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

Please check this link: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html

This gives a detailed description of various configuration parameters.

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
  • I'm trying this today and I'll give you a green if it works. I wasn't aware of the limitation of 32 cores, thanks – Queasy Oct 19 '15 at 15:45
  • Please see the description of "yarn.scheduler.maximum-allocation-vcores" here: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml. It states 32 as the default value. You can try with 40 and check. – Manjunath Ballur Oct 19 '15 at 15:56
  • I've been running 40 and seems to be working fine but until I get 32 cores or 40 working on 1 container I'll just be using 32 cores to be safe, then I'll up the core count if indeed it works. Thanks! – Queasy Oct 19 '15 at 20:06
  • Still showing 1 core per container. Attached mapred xml below. Still in progress to make it work.. – Queasy Oct 19 '15 at 21:36
  • Sorry, I don't see any attachment. I hope you also took care of scheduler settings. If you do not use "DominantResourceCalculator", vCores won't be considered for resource calculation. Only memory will be considered. – Manjunath Ballur Oct 20 '15 at 06:16
  • I inserted this above from another comment, here is the link: http://pastie.org/10493308 – Queasy Oct 20 '15 at 15:35
  • Your mapred-site.xml settings look fine. Can you tell me the values of "yarn.nodemanager.resource.cpu-vcores" and "yarn.scheduler.maximum-allocation-vcores" in your yarn-site.xml? – Manjunath Ballur Oct 20 '15 at 15:51
  • I have yarn.scheduler.maximum-allocation-vcores set at 32 cores. and yarn.scheduler.minimum-allocation-vcores @ 32 but I've also tried 1 through 32 but no change in the amount of cores per container. Thanks for the comment. – Queasy Oct 20 '15 at 15:58
  • I am asking about: yarn.nodemanager.resource.cpu-vcores. I have answered a similar question here: http://stackoverflow.com/questions/33099601/how-are-containers-created-based-on-vcores-and-memory-in-mapreduce2/33130620#33130620. Can you compare your settings? In my answer I forgot about this setting: yarn.nodemanager.resource.cpu-vcores. – Manjunath Ballur Oct 20 '15 at 16:06
  • Thanks Manjunath Ballur! The capacity-scheduler.xml and changing the yarn.scheduler.capacity.resource-calculator did the trick and now I have 40 cores. Thanks for the link and information provided. – Queasy Oct 20 '15 at 17:29
1

Honestly I don't know much about Hadoop 2.7, but if the mapper is able to utilize more threads, number of cores per map (or reduce) container can be set by this setting these properties in mapred-site.xml file:

mapreduce.map.cpu.vcores - The number of virtual cores to request from the scheduler for each map task.

mapreduce.reduce.cpu.vcores - The number of virtual cores to request from the scheduler for each reduce task.

Please refer to the Hadoop documentation

Community
  • 1
  • 1
vanekjar
  • 2,386
  • 14
  • 23
  • I have referred to the hadoop documentation but no related information. I tried the above which I already setup but both were setup @ 1, and so I changed to 40 cores but still it didn't work. – Queasy Oct 16 '15 at 20:07
  • 1
    Maybe the problem is somewhere else. Are you aware that typical mapper uses just one core since it runs in single thread? Even though you give the container more cores, it will use just the one. Could you post a source code of your mapper? – vanekjar Oct 17 '15 at 08:56