I want to sort a big dataset efficiently (i.e. with a custom partitioner, like described here: How does the MapReduce sort algorithm work?), but I want to do it with hive.
However, the Hive manual states that "order by" is performed by a single reducer. This surprises me, as pig does implement something similar to the article - pig impl
Am I missing something, or is it that hive simply isn't the right hammer for this job?