I'm new to Hadoop, creating my first mapreduce job. But while writing the main method class, I came across with with two two methods
setMapOutputKeyClass
setMapOutputValueClass
These two method needs to be used only when mapper producing different types from reducer output type. Otherwise by default setOutputKeyClass()
and setOutputValueClass()
methods gets set with types which is common to Mapper and reducer's type.
My doubts are (setMapOutputKeyClass
and setMapOutputValueClass
methods):
- Why these two methods are required? What is the actual purpose of that?
- If reducer's output type is different from Mapper output's type, then what is being done by setting the types using these two methods?
- If types are different what others things are getting affected?
Thanks