43

I'm a novice on hadoop, I'm getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map result directly as output, which means reduce phase is not needed here, how can I achive that?

Breakinen
  • 619
  • 2
  • 7
  • 13
  • 1
    Check this [Map-only Jobs](http://www.unmeshasreeveni.blogspot.in/2014/05/map-only-jobs-in-hadoop.html) – USB May 05 '14 at 06:04

4 Answers4

59

This turns off the reducer.

job.setNumReduceTasks(0);

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
9

You can also use the IdentityReducer:

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/IdentityReducer.html

Peter Wippermann
  • 4,125
  • 5
  • 35
  • 48
  • 1
    Thank you Peter, I read the source of IdentityReducer, it's really what I meant to do, but are there any method to directly output the map result to HDFS without reduce? (you know the shuffle phase costs lots of bandwidth and cpu/memory resource) – Breakinen Feb 23 '12 at 15:31
  • IdentityMapper can be used with or without a follow-on reducer. If you use the identity mapper to jump straight thru to the reduce stage you still have the sort-and-shuffle and i/o overhead so using the method mentioned by Thomas is the right way to go if you don't need a reducer. – omnisis Feb 14 '13 at 07:45
  • 3
    I'm sorry omnisis, but that's not correct: Setting the number of reduce tasks to zero will omit any sorting. http://stackoverflow.com/questions/10630447/hadoop-difference-between-0-reducer-and-identity-reducer – Peter Wippermann Feb 15 '13 at 10:02
5

Can be quite helpful when you need to launch job with mappers only from terminal. You can turn off reducers by specifing 0 reducers in hadoop jar command implicitly:

-D mapred.reduce.tasks=0 

So the result command will be following:

hadoop jar myJob.jar -D mapred.reduce.tasks=0 -input myInputDirs -output myOutputDir

To be backward compatible, Hadoop also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".

Alex
  • 8,827
  • 3
  • 42
  • 58
  • 1
    Now hadoop gives a depreciation warning for -D mapred.reduce.tasks and recommends to use -D mapreduce.job.reduce instead. – Adam Jan 27 '17 at 19:49
0

If you are using oozie as a scheduler to manager your hadoop jobs, then you can just set the property mapred.reduce.tasks(which is the default number of reduce tasks per job) to 0. You can add your mapper in the property mapreduce.map.class, and also there will be no need to add the property mapreduce.reduce.class since reducers are not required.

<configuration>
   <property>
     <name>mapreduce.map.class</name>
     <value>my.com.package.AbcMapper</value>
   </property>
   <property>
     <name>mapred.reduce.tasks</name>
     <value>0</value>
   </property>
   .
   .
   .
<configuration>
Neha Kumari
  • 757
  • 7
  • 16