0

I would like to know experts answer on this scenario:

Say , I have 150 MB file in 3 blocks of 64MB at max. By default 3 mappers will initiate my Map Reduce.

If want to increase/decrease num of mappers what is the command?

If I try to increase middle of the process what will happen as I have only 3 blocks to process. As soon as I started process it will take new number of mappers or how it will behave?

Can experts throw some light on this concept?

Thank you

Ravi
  • 15
  • 7

1 Answers1

0

This should help you

Number of Maps

The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps. The right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.

Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.

The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.

More details at - https://wiki.apache.org/hadoop/HowManyMapsAndReduces

I am not sure whether you can do it during the job execution; this shall have to be handled before job launch

Anirudh
  • 150
  • 3
  • A relevant discussion here - http://stackoverflow.com/questions/6885441/setting-the-number-of-map-tasks-and-reduce-tasks – Anirudh Mar 25 '16 at 08:56