how hadoop reduce tasks deal with map grouped data

Question

Reduce method deals with grouped data from map. But I wonder how do reduce tasks take the groups data? If maps output many grouped data, do each reduce task just read the same numbers of groups?? What is the mechanism??

score 0 · Answer 1 · edited May 23 '17 at 12:31

how do reduce tasks take the groups data?

It is handled on Shuffle and Sort phase

During this phasedData which is sent by mappers are grouped by key (Like group by(key)), finally it obtains key,List<> result. Result is sent to reducers. If results need to be sent to different reducers it is taken care of partition phase which is a different phase than Shuffle and Sort Phase.

This phase is done by Hadoop framework and as far as I know you have nothing to do or change about this phase.

also I suggest take a look at this question What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?

how hadoop reduce tasks deal with map grouped data

1 Answers1