What I realize is that creating a key sorted list to be sent to the reducer is the mappers main objective. Then if the list is very big it needs to be partitioned in mapper so that it can be handled by reducer(I mean for a unique key the value list is huge then it needs to be partitioned), but why exactly does hadoop need to sort the keys in mapper. I was asked this question by some one and I couldn't fully convince him. I am just a beginner and was a bit curious . Any help is appreciated.
Asked
Active
Viewed 1,423 times
1 Answers
0
Sorting happens after mapper phase and before executing reducer job, you are not require to do it explicitly.
Please refer similar question