Using spark 2, I want to use a case class object as the key of a tuple for join or reduce by key operation, but it seems like it does not work the way I have been able to explore until now. Please see the example below. Bob should have been 2 as the final output but each is treated differently as you see at the end:-
case class P(name:String)
{
override def hashCode:Int = { name.hashCode() }
override def equals(obj:Any):Boolean = {
obj match {
case x:P => x.name.contentEquals(this.name)
case _ => false
}
}
implicit def ===(obj:Any):Boolean = {
obj match {
case x:P => x.name.contentEquals(this.name)
case _ => false
}
}
}
// Exiting paste mode, now interpreting.
warning: there was one feature warning; re-run with -feature for details defined class P
val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob))
sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
res9: Array[(P, Int)] = Array((P(alice),1), (P(charly),1), (P(bob),1), (P(bob),1)) `