When using Hadoop streaming, the partitioner and sorter can be set and configurated like this:
hadoop jar /opt/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar \
-D mapreduce.map.output.key.field.separator=. \
-D stream.map.output.field.separator= \
-D stream.num.map.output.key.fields=2 \
-D num.key.fields.for.partition=2 \
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
I would like to the same thing in my Java main()
method. The sorting can be achieved like this:
job.setSortComparatorClass(KeyFieldBasedComparator.class);
KeyFieldBasedComparator.setKeyFieldComparatorOptions(job, "-k 1,2");
The method .setKeyFieldPartitionerOptions
in the class KeyFieldBasedPartitioner
, however, is not static
:
KeyFieldBasedPartitioner partitioner = new KeyFieldBasedPartitioner();
partitioner.setKeyFieldPartitionerOptions(job, "-k 1,2");
In the job object, I can only set a class, however:
job.setPartitionerClass(KeyFieldBasedPartitioner.class);
How can the above options be set in this case? I could, of course, implement my own partitioner class, but why the effort if there should be a simple way?