i just want to know how many times a reducer is called in a map reduce program. What i know is the no of mappers are equal to the no of input splits i.e for each input split 1 mapper runs and the o/p of each mapper is passed to the reducer , so it is passed one by one or it gets all the data at one time and process that data(reduce the data), so i just want to know the flow or working of reducer .
Asked
Active
Viewed 2,462 times
3 Answers
3
A reducer is usually called once for each unique key, but you can specify a GrouperComparator (e.g. for secondary sort) and the reducer would then be called once for each group of keys, as determined by the GrouperComparator.
Although log messages might seem to imply that the reduce step starts before the mappers are all complete, the reducer isn't called until all mappers are complete.

Chris Gerken
- 16,221
- 6
- 44
- 59
-
Thanks this helped, actually i was confused about on what basis the reducer is called . Thanks :) – u12345 Dec 24 '14 at 06:42
0
'Ideally' the reducer phase can start immediate after 1st mapper finishes successfully.
You would like to refer similar question : when-do-reduce-tasks-start-in-hadoop
0
- The number of mappers depends of the no of input splits.
- The number of reducers is set by the user.
You can specify :
mapreduce.job.reduces=N
You can set 0 reducer if you want.

ALSimon
- 161
- 2
-
I was actually concerned about how many times a reducer is called i.e how many times 1 reducer is called, Anyways thanks for the reply :) – u12345 Dec 24 '14 at 06:38
-
A reducer is called only one time except if the speculative execution is activated. You can change `mapreduce.reduce.speculative`. See https://developer.yahoo.com/hadoop/tutorial/module4.html for more information. – ALSimon Dec 26 '14 at 12:07