Hadoop KeyComposite and Combiner

Question

I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

I have the exact same code, but now I am trying to improve performance so I have decided to add a combiner. I have added two modifications:

Main file:

job.setCombinerClass(CombinerK.class);

Combiner file:

public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {

    public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {


        Iterator<KeyWritable> it = values;

        System.err.println("combiner " + key);

        KeyWritable first_value = it.next();
        System.err.println("va: " + first_value);

        while (it.hasNext()) {

            sum += it.next().getSs();

        }
        first_value.setS(sum);
        context.write(key, first_value);


    }
}

But it seems that it is not run because I can't find any logs file which have the word "combiner". When I saw counters after running, I could see:

    Combine input records=4040000
    Combine output records=4040000

The combiner seems like it is being executed but it seems as it has been receiving a call for each key and by this reason it has the same number in input as output.

You could not really say with combine input and output records, but that indicates the combiner is run. But on the number of input and output records, may be in a single mapper you do not have identical keys. You could read http://stackoverflow.com/questions/17160852/on-what-basis-mapreduce-framework-decides-whether-to-launch-a-combiner-or-not to get more insight on the combiner. — Vignesh I, Oct 04 '15 at 10:47
@VigneshI I have created multiple mappers (increasing size of file), I have sent identical K,V in order to check. I have duplicated the line context.write, with the same args. By this reason, as I know that I have sent identical K,V I think that my keycomposite need to implement some method or something like that, in order to check equality or not. — ie8888, Oct 04 '15 at 12:36
You will not get any hints other than the combine input and output records count in the log. Place a sysout in your combiner and run your MR job and just check the stdout logs in jobtracker URL in the reduce side. — Vignesh I, Oct 04 '15 at 17:00
@VigneshI I have checked before, and it didn't work too. Thank for helping. — ie8888, Oct 04 '15 at 18:34

Hadoop KeyComposite and Combiner

0 Answers0