This requires a
TotalOrderPartitioner
https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/partition/TotalOrderPartitioner.html
which enforces an additional stage in the M/R pipeline to partition the elements into sorted buckets.
The TreeMap solution will not work globally but only within each Reducer.
Here is a gist (not mine) showing how to use TotalOrderPartioner: https://gist.github.com/asimjalis/e5627dc2ff2b23dac70b
The key takeaways from the gist are:
a) you need to invoke reducer.setPartitionerClass to TotalOrderPartitioner:
// Use Total Order Partitioner.
reduceJob.setPartitionerClass(TotalOrderPartitioner.class);
b) You need to generate a set of splits to be used as the "buckets" for the TOP
// Generate partition file from map-only job's output.
TotalOrderPartitioner.setPartitionFile(
reduceJob.getConfiguration(), partitionPath);
InputSampler.writePartitionFile(reduceJob, new InputSampler.RandomSampler(
1, 10000));