From here:
As per hadoop definitive guide "Within each partition, the back-ground thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort"
I thought a partition corresponds to one key, and thus a reduce task would reduce a bunch of values association with only one key. If there is only one key, isn't the partition already sorted?
After all, this answer from here, to me seems to contradict the previous quote:
Sorting saves time for the reducer, helping it easily distinguish when a new reduce task should start. It simply starts a new reduce task, when the next key in the sorted input data is different than the previous, to put it simply.
It is saying that a reduce task is associated with one key, and since thre is one partition per reduce task, a partition is associated with one key. So how come there must be a sort within each partition if there is only one key?