6

When using Hadoop streaming, the partitioner and sorter can be set and configurated like this:

hadoop jar /opt/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar \
-D mapreduce.map.output.key.field.separator=. \
-D stream.map.output.field.separator= \
-D stream.num.map.output.key.fields=2 \
-D num.key.fields.for.partition=2 \
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner    

I would like to the same thing in my Java main() method. The sorting can be achieved like this:

job.setSortComparatorClass(KeyFieldBasedComparator.class);    
KeyFieldBasedComparator.setKeyFieldComparatorOptions(job, "-k 1,2");

The method .setKeyFieldPartitionerOptions in the class KeyFieldBasedPartitioner, however, is not static:

KeyFieldBasedPartitioner partitioner = new KeyFieldBasedPartitioner();
partitioner.setKeyFieldPartitionerOptions(job, "-k 1,2");

In the job object, I can only set a class, however:

job.setPartitionerClass(KeyFieldBasedPartitioner.class); 

How can the above options be set in this case? I could, of course, implement my own partitioner class, but why the effort if there should be a simple way?

irondwarf
  • 195
  • 1
  • 8
  • Are you using Job class or JobConf? – Vinkal Oct 25 '15 at 09:02
  • I am using the Job class as far as I know. Newest Hadoop version. – irondwarf Oct 25 '15 at 13:44
  • check if this article is useful for you : http://blog.zaloni.com/secondary-sorting-in-hadoop – Ravindra babu Nov 08 '15 at 13:06
  • Unfortunately, the link uses custom implementation of the Comparable classes. That did the trick for me as well, but it was a lot more work than if I could have used the already implemented classes. I am therefore still interested in an answer. But thanks for the link anyway! – irondwarf Nov 09 '15 at 14:06

0 Answers0