I encountered a non-understandable problem. When I tested below code with IDE like IntelliJ locally, the result is true.
val sparkConf = new SparkConf().setAppName("QueryMySql").setMaster("local")
val sc = new SparkContext(sparkConf)
case class Store(val name : String) {
override def equals(o : Any) = o match {
case that: Store => that.name.equals(this.name)
case _ => false
}
override def hashCode = name.hashCode
}
val storeAddressList = List(
(Store("Candy") , "dongilStreet 1"),
(Store("Choco") , "kangnam Street 2"),
(Store("Choco") , "bongchen Street 3"),
(Store("Icecream") , "samsung street 4")
)
val storeAddress = sc.parallelize(storeAddressList)
val storeRatingList = List(
(Store("Candy") , 4.9),
(Store("Choco") , 4.8)
)
val storeRating = sc.parallelize(storeRatingList)
storeAddress.collect
storeRating.collect
println(storeAddress.first._1.equals(storeRating.first._1))
However, I ran the same code that first and second line removed with spark-shell. It was:
case class Store(val name : String) {
override def equals(o : Any) = o match {
case that: Store => that.name.equals(this.name)
case _ => false
}
override def hashCode = name.hashCode
}
val storeAddressList = List(
(Store("Candy") , "dongilStreet 1"),
(Store("Choco") , "kangnam Street 2"),
(Store("Choco") , "bongchen Street 3"),
(Store("Icecream") , "samsung street 4")
)
val storeAddress = sc.parallelize(storeAddressList)
val storeRatingList = List(
(Store("Candy") , 4.9),
(Store("Choco") , 4.8)
)
val storeRating = sc.parallelize(storeRatingList)
storeAddress.collect
storeRating.collect
println(storeAddress.first._1.equals(storeRating.first._1))
and result is false.
To find out the cause, I have tried these: First of all, I checked storeRating.first._1 because storeAddress has 4 partitions and each partition has value. In contrast, storeRating has 2 partitions and only two of them have value. So, I thought it would be
Store(Candy).equals(null)
but, it was wrong, they have value.
Second, I suspected the hashcode of Store case class, but they have the same hash code unfortunately.
scala> println(storeAddress.first._1.hashCode())
64874565
scala> println(storeRating.first._1.hashCode())
64874565
Third, I checked the value of storeAddress and storeRating, and they are also the same.
Please, help me find the cause of this bizarre situation.