1

I have gone through lot of blogs on stackoverflow and also apache wiki for getting to know the way the mappers are set in Hadoop. I also went through [hadoop - how total mappers are determined [this] post. Some say its based on InputFormat and some posts say its based on the number of blocks the input file id split into.

Some how I am confused by the default setting.

When I run a wordcount example I see the mappers are low as 2. What is really happening in the setting ? Also this post [http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/QuasiMonteCarlo.java] [example program]. Here they set the mappers based on user input. How can one manually do this setting ?

I would really appreciate for some help and understanding of how mappers work.

Thanks in advance

Community
  • 1
  • 1
user3560220
  • 221
  • 3
  • 4
  • 11

1 Answers1

0

Use the java system properties mapred.min.split.size and mapred.max.split.size to guide Hadoop to use the split size you want. This won't always work - particularly when your data is in a compression format that is not splittable (e.g. gz, but bzip2 is splittable).

So if you want more mappers, use a smaller split size. Simple!

(Updated as requested) Now this won't work for many small files, in particular you'll end up with more mappers than you want. For this situation use CombineFileInputFormat ... in Scalding this SO explains: Create Scalding Source like TextLine that combines multiple files into single mappers

Community
  • 1
  • 1
samthebest
  • 30,803
  • 25
  • 102
  • 142
  • Hello @Sam: Thanks for your answer. I get your point on deciding the number of mappers based on mapred.min.split.size , but my input-size are so small, that this method doesn't really help. I found a way around, where one can decide based on InputFormat , where one can set it more flexibly [example](http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/). Thanks – user3560220 Jul 18 '14 at 09:49