Combiners are there to save network bandwidth.
The mapoutput directly gets sorted:
sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter);
This happens right after the real mapping is done. During iteration through the buffer it checks if there has a combiner been set and if yes it combines the records. If not, it directly spills onto disk.
The important parts are in the MapTask
, if you'd like to see it for yourself.
sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter);
// some fields
for (int i = 0; i < partitions; ++i) {
// check if configured
if (combinerRunner == null) {
// spill directly
} else {
combinerRunner.combine(kvIter, combineCollector);
}
}
This is the right stage to save the disk space and the network bandwidth, because it is very likely that the output has to be transfered.
During the merge/shuffle/sort phase it is not beneficial because then you have to crunch more amounts of data in comparision with the combiner run at map finish time.
Note the sort-phase which is shown in the web interface is misleading. It is just pure merging.