I am trying to use a case class for key in a rdd and then reduceByKey using the case class. But when a tuple is used it is working fine.
case class Employee(id: Int, name: String)
val e1 = Employee(1,"chan")
val e2 = Employee(1,"joey")
val salary = Array((e1,100),(e2,1100),(e1,190),(e2,110))
val salaryRDD = sc.parallelize(salary)
salaryRDD.reduceByKey(_+_).collect
Output:
res1: Array[(Employee, Int)] = Array((Employee(1,chan),100),
Employee(1,chan),190), (Employee(1,joey),1100), (Employee(1,joey),110))
But When used with tuples this works fine.
val t1 = (1,"chan")
val t2 = (1,"joey")
val salary2 = Array((t1,100),(t2,1100),(t1,190),(t2,110))
val salaryRDD2 = sc.parallelize(salary2)
salaryRDD2.reduceByKey(_+_).collect
Output:
res2: Array[((Int, String), Int)] = Array(((1,chan),290), ((1,joey),1210))
The hashCode and equals works well in case class.
scala> val em1 = Employee(1,"chan")
em1: Employee = Employee(1,chan)
scala> val em2 = Employee(1,"chan")
em2: Employee = Employee(1,chan)
scala> em1 == em2
res5: Boolean = true
scala> em1.hashCode
res6: Int = 545142355
scala> em2.hashCode
res7: Int = 545142355
Why is this behaviour ? How to make case class to work with reduceByKey ?