1

job.setNumReduceTasks(0) results in a map only job

does this mean intermediate phase (shuffle and sort) are not performed?

how is it compared to having an empty reduce method (no operations):

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
  public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {                             
    \\do nothing            
  }
}

Or is it equivalent

dbustosp
  • 4,208
  • 25
  • 46
zaranaid
  • 65
  • 1
  • 13

1 Answers1

1

The difference is simple,

  1. Map-Only Job In a Map-only job, you do not have shuffle phase, which means no data is sending across the network. The mappers will generate automatically the results. Check this out.
  2. Map-Reduce Job Even though your reducers are doing nothing, the data will be sent to the reducers, which means, shuffle phase is happening. Reducers will write the results to disk.
dbustosp
  • 4,208
  • 25
  • 46