1

Using spark 2, I want to use a case class object as the key of a tuple for join or reduce by key operation, but it seems like it does not work the way I have been able to explore until now. Please see the example below. Bob should have been 2 as the final output but each is treated differently as you see at the end:-

case class P(name:String)
{
  override def hashCode:Int = { name.hashCode() }
  override def equals(obj:Any):Boolean = {
    obj match {
  case x:P => x.name.contentEquals(this.name)
  case _ => false
}
  }
  implicit def ===(obj:Any):Boolean = {
obj match {
  case x:P => x.name.contentEquals(this.name)
  case _ => false
}
  }

}

// Exiting paste mode, now interpreting.

warning: there was one feature warning; re-run with -feature for details defined class P

val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob))

sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
res9: Array[(P, Int)] = Array((P(alice),1), (P(charly),1), (P(bob),1), (P(bob),1)) ` 
Sᴀᴍ Onᴇᴌᴀ
  • 8,218
  • 8
  • 36
  • 58
subhankar
  • 95
  • 1
  • 8

0 Answers0