I've always felt that case class equality behaves very well... until today!
In the spark shell, I ran the following:
scala> case class TV(value:Int)
defined class TV
scala> sc.parallelize(Seq((TV(1),"a"),(TV(1),"b"))).map(_._1).countByValue()
res37: scala.collection.Map[TV,Long] = Map(TV(1) -> 1, TV(1) -> 1)
I would have expected both TV
values to get rolled up as the same key.
If I pull the value from the TK, it behaves as expected:
sc.parallelize(Seq((TV(1),"a"),(TV(1),"b"))).map(_._1.value).countByValue()
res38: scala.collection.Map[Int,Long] = Map(1 -> 2)
What's going on here?