1

I'm new to Hadoop, creating my first mapreduce job. But while writing the main method class, I came across with with two two methods

setMapOutputKeyClass
setMapOutputValueClass 

These two method needs to be used only when mapper producing different types from reducer output type. Otherwise by default setOutputKeyClass() and setOutputValueClass() methods gets set with types which is common to Mapper and reducer's type.

My doubts are (setMapOutputKeyClass and setMapOutputValueClass methods):

  1. Why these two methods are required? What is the actual purpose of that?
  2. If reducer's output type is different from Mapper output's type, then what is being done by setting the types using these two methods?
  3. If types are different what others things are getting affected?

Thanks

currarpickt
  • 2,290
  • 4
  • 24
  • 39
Joy
  • 171
  • 2
  • 10
  • 1
    Possible duplicate of [Why setMapOutputKeyClass method is necessary in mapreduce job](https://stackoverflow.com/questions/38376688/why-setmapoutputkeyclass-method-is-necessary-in-mapreduce-job) – Binary Nerd Jan 16 '18 at 08:20
  • @BinaryNerd, I have seen that post, but it doesnt provide information of why it needs that or what's the purpose of these methods ? That makes sense that Mapper's output should be equivalent to reducer's input because if it doesnt ..will not work. But why reducer's output should also match ? Its a end point..So if reducer's output doesn't match with mapper's output...than what's a problem here...Why it needs to be set using setMapOutputValueClass method explicitly ? – Joy Jan 21 '18 at 15:37

0 Answers0