Can Hadoop Java reduce API ensure that the values of a same key are ordered?

Question

Like the code bellow:

// Input:
// key=someword
// values: can it be like [2,1,6,4,8], or should it must be ordered like [1,2,4,6,8]?
public void reduce(Text key, Iterable<IntWritable> values, Context context){
        //...
}

Can values be like [2,1,6,4,8], or should it must be ordered like [1,2,4,6,8]?

Thanks for your answer!

It's a bit complex. However, you can google "secondary sort in Hadoop". There are many articles already. — zsxwing, Feb 26 '14 at 03:31
Thanks! Order by value is not so often required and can be expensive, I believe. So that's why "Hadoop doesn't sort on values." I think I can just sort the list of values by myself, though which may be less efficient. — Viky Leaf, Feb 26 '14 at 05:59
And yes, a google result of "secondary sort in Hadoop" leads me to http://stackoverflow.com/questions/18395998/hadoop-map-reduce-secondary-sorting, which solved my problem with a custom GroupPartitioner which lets the grouping process before reduce only groups by part of the key. Thus I can put part of the value to the key, and they will be sorted but not be the part of the group by factor. That's it! Many thanks! — Viky Leaf, Feb 26 '14 at 06:06
yeah, it is possible but its little tricky, secondry sort is the solution for this , in which hadoop will able to short values too, a/c to your requirement... — Ashish Ratan, Feb 26 '14 at 07:11

Can Hadoop Java reduce API ensure that the values of a same key are ordered?

0 Answers0