2

I've always felt that case class equality behaves very well... until today!

In the spark shell, I ran the following:

scala> case class TV(value:Int)
defined class TV

scala> sc.parallelize(Seq((TV(1),"a"),(TV(1),"b"))).map(_._1).countByValue()
res37: scala.collection.Map[TV,Long] = Map(TV(1) -> 1, TV(1) -> 1)

I would have expected both TV values to get rolled up as the same key.

If I pull the value from the TK, it behaves as expected:

sc.parallelize(Seq((TV(1),"a"),(TV(1),"b"))).map(_._1.value).countByValue()
res38: scala.collection.Map[Int,Long] = Map(1 -> 2)

What's going on here?

Larsenal
  • 49,878
  • 43
  • 152
  • 220

0 Answers0