When I write the mapreduce program, I often write the code like
job1.setMapOutputKeyClass(Text.class);
But why should we specify the MapOutputKeyClass explicitly? We have already spicify it in the mapper class such as
public static class MyMapper extends
Mapper<LongWritable, Text, Text, Text>
In the book Hadoop:The definitive Guide, there is a table shows that the method setMapOutputKeyClass is optional(Properties for configuring types), but as I test, I found it is necessary, or the Console of eclipse will show
Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text
Can someone tell me the reason of it?
In the book, it says
"The settings that have to be compatible with the MapReduce types are listed in the lower part of Table 8-1". Does it mean we have to set the lower part property type, but do not have to set the higher part ones?
the content of the table looks like this:
Properties for configuring types:
mapreduce.job.inputformat.class
mapreduce.map.output.key.class
mapreduce.map.output.value.class
mapreduce.job.output.key.class
mapreduce.job.output.value.class
Properties that must be consistent with the types:
mapreduce.job.map.class
mapreduce.job.combine.class
mapreduce.job.partitioner.class
mapreduce.job.output.key.comparator.class
mapreduce.job.output.group.comparator.class
mapreduce.job.reduce.class
mapreduce.job.outputformat.class