0

I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

I have the exact same code, but now I am trying to improve performance so I have decided to add a combiner. I have added two modifications:

Main file:

job.setCombinerClass(CombinerK.class);

Combiner file:

public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {

    public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {


        Iterator<KeyWritable> it = values;

        System.err.println("combiner " + key);

        KeyWritable first_value = it.next();
        System.err.println("va: " + first_value);

        while (it.hasNext()) {

            sum += it.next().getSs();

        }
        first_value.setS(sum);
        context.write(key, first_value);


    }
}

But it seems that it is not run because I can't find any logs file which have the word "combiner". When I saw counters after running, I could see:

    Combine input records=4040000
    Combine output records=4040000

The combiner seems like it is being executed but it seems as it has been receiving a call for each key and by this reason it has the same number in input as output.

ie8888
  • 171
  • 1
  • 10
  • You could not really say with combine input and output records, but that indicates the combiner is run. But on the number of input and output records, may be in a single mapper you do not have identical keys. You could read http://stackoverflow.com/questions/17160852/on-what-basis-mapreduce-framework-decides-whether-to-launch-a-combiner-or-not to get more insight on the combiner. – Vignesh I Oct 04 '15 at 10:47
  • @VigneshI I have created multiple mappers (increasing size of file), I have sent identical K,V in order to check. I have duplicated the line context.write, with the same args. By this reason, as I know that I have sent identical K,V I think that my keycomposite need to implement some method or something like that, in order to check equality or not. – ie8888 Oct 04 '15 at 12:36
  • You will not get any hints other than the combine input and output records count in the log. Place a sysout in your combiner and run your MR job and just check the stdout logs in jobtracker URL in the reduce side. – Vignesh I Oct 04 '15 at 17:00
  • @VigneshI I have checked before, and it didn't work too. Thank for helping. – ie8888 Oct 04 '15 at 18:34

0 Answers0