1

Does the max memory for Reduce task need to more than the Map task in a MapReduce application on YARN? Like below...

mapreduce.map.memory.mb = 7
mapreduce.reduce.memory.mb = 14
mapreduce.map.java.opts = 0.8 * 7 = 5,6
mapreduce.reduce.java.opts = 0.8 * 2 * 7 = 11,2
user1965449
  • 2,849
  • 6
  • 34
  • 51

1 Answers1

0

There is no hard and fast rule that, reduce task memory should be greater than the map task memory.

By default, both mapreduce.map.memory.mb and mapreduce.reduce.memory.mb are set to 1,024 MB. There is an lower and upper limit on these values, which is imposed by yarn.scheduler.minimum-allocation-mb (default value is 1024 MB) and yarn.scheduler.maximum-allocation-mb (default value is 8,192 MB) respectively.

But, generally it is recommended that the reducer memory settings are higher than the mapper memory settings. The reason for this could be, the number of reducers is lesser than the number of mappers and the reducers aggregate records from 'n' number of mappers. Also, you can optimize the shuffle and sorting phase by tuning reducer's memory configuration parameters like: mapreduce.reduce.shuffle.input.buffer.percent (percentage of heap used for storing the mapper outputs.).

Cloudera recommends reduce task memory settings to be twice the map task's memory: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html

You can also check these settings for various types of AWS clusters here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html. You can observe that mapreduce.reduce.memory.mb is always greater than or equal to mapreduce.map.memory.mb.

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48