In java map-reduce, how to print the key with max value?

Question

I'm trying to modify an existing code, i managed to print key(grouped) and value (count of occurrences) but i need extract only one key which has the max value (count of occurrences). I'm not a java expert, so kindly excuse me for not explaining the question properly.

current output:

Expected:

994380  33

code

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer
  extends Reducer<Text, IntWritable, Text, IntWritable> {

  @Override
  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {

    int count = 0;
    for (IntWritable value : values) {
        if(value.get() == 9999)
          count++;
    }
    context.write(key, new IntWritable(count));

  }
}

I think you should change the question to Java MAP-REDUCE, not a Java collection Map — Mark Giaconia, Sep 28 '17 at 20:46
This is not a duplicate (since Varun changed the title), this is a Hadoop Map Reduce related question, it is not as simple as just getting the max from a java Map. Map Reduce is distributed, and poses a different problem. — Mark Giaconia, Sep 29 '17 at 20:14

Mark Giaconia · Answer 1 · 2017-09-28T21:22:20.367

Your code is the Reducer in a Map Reduce Job, so inherently in that Hadoop framework, inside the scope of the Reducer's reduce method (the one you posted), you only have one key, with a bunch of values. You have to aggregate the output of all the reducers (from the output of your context.write...) and then get the key (globally) from somewhere that has the max value. Writing the data to HBase or to another HDFS file should do the trick. If you are somehow making all the mappers write to one reducer (which I'm not sure is possible) then you can just keep a Max of the value, and a variable of the corresponding key. This is complicated in MapReduce, because this method is distributed on many nodes possibly. Like this maybe, except you might need to move the variables to the class level, and this still won't solve your problem if you have more than one reducer.

 @Override
  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {

    int max= 0;
String keyWithMax="";
    for (IntWritable value : values) {
        if(value.get() > max){
          max= value.get();
          keyWithMax=key.toString();
          }
}
    context.write(keyWithMax, new IntWritable(max));

  }

If you are running multiple reducers, the way I've done it is to write each reducer's output to a file, then run the hadoop command to merge "part files" into one, then you can read that into somewhere. The other (better IMO) approach is something like HBase as the output format, so you can easily access the output data from an application after the job runs. — Mark Giaconia, Sep 29 '17 at 19:11

In java map-reduce, how to print the key with max value?

1 Answers1