0

If I have only one key. Can I avoid it being sent to only one reducer (and distribute it across multiple reducers)?

I understand that then I might have to have a second map reduce program to combine the reducer outputs? Is this a good approach? Or please let me know if there is a better way?

Gadam
  • 2,674
  • 8
  • 37
  • 56

1 Answers1

1

I was in a similar situation once. What I did is something like this :

int numberOfReduceCalls = 5
IntWritable outKey = new IntWritable();
Random random = new Random();
public void map(LongWritable key, Text value, Context context)
                      throws IOException, InterruptedException {
    // use a random integer within a limit
    outKey.set( random.nextInt(numberOfReduceCalls) );  
    context.write(outKey, value);
}
blackSmith
  • 3,054
  • 1
  • 20
  • 37
  • Sorry I did not get what you are trying to do here. What is the purpose of 'outkey'? You are not using it after you set its value. And you seem to be doing all this in the mapper itself. Can you please explain? – Gadam Nov 14 '14 at 16:49
  • It was a mistake, updated the `context.write` statement, have a look. Instead of writing a custom partitioner, I generated 5 keys in the map randomly, so that values will be distributed to 5 different reduce calls. If you're using `TextinputFormat`, the above code is sufficient to get the job done. In `reduce` you will simply ignore the key, and process the values only. – blackSmith Nov 17 '14 at 08:41
  • So this is basically mimicking the partitioner functionality without actually using one. And I assume we still need a second map-reduce to combine the reducer outputs. I will +1 this and wait to see if there are any other approaches. Thanks! – Gadam Nov 17 '14 at 16:45
  • certainly other ways exist, but I just took advantage of the `map-reduce` feature itself, rather than to introduce a new type. But u can't deny the merging in any case when using multiple reducer, it's an inevitable tradeOff. http://stackoverflow.com/questions/5700068/merge-output-files-after-reduce-phase – blackSmith Nov 18 '14 at 04:00