0

In map-reduce job, if you set 4 reducers to implement the reducer jobs. By doing this, the final output will generate 4 part-files. Like:

part-r-00001
part-r-00002
part-r-00003
part-r-00004

In this scenario how to get a consolidated value? Say If I am calculating a maximum number? The above case will apparently generate 4 different files, which returns 4 different values.

Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68
Shakti Kumar
  • 137
  • 1
  • 7

1 Answers1

1

A short answer would be use one reducer in your case.

But when the mapper's make too much output that single reducer can't handle, I suggest you to use two round mapreduce work.

In the first round, you output consolidated value in each reduce task. In the second round, you use one reducer to figure out consolidated value on the reduced data set of the first round.

If those still won't solve you problem, maybe you should take a look at grouping comparator in hadoop map reduce

Community
  • 1
  • 1
luoluo
  • 5,353
  • 3
  • 30
  • 41
  • Thanks a lot for your explanation, so we have to run multiple map reduce untill we get the intended output? In that case we have to write another map reduce job for the resultant files. Is my understanding correct? – Shakti Kumar Aug 24 '15 at 09:58
  • Yes. Also you can define your own `grouping comparator` in MR. – luoluo Aug 24 '15 at 10:11