I'm trying to access a HashMap via a function in Spark 2.0, but if I parallelize the List, it fails. If I don't, it works, and if I don't use a Case Class, it works.
Here's some sample code of what I'm trying to do:
case class TestData(val s: String)
def testKey(testData: TestData) {
println(f"Current Map: $myMap")
println(f"Key sent into function: $testData")
println("Key isn't found in Map:")
println(myMap(testData)) // fails here
}
val myList = sc.parallelize(List(TestData("foo")))
val myMap = Map(TestData("foo") -> "bar")
myList.collect.foreach(testKey) // collect to see println
Here's the exact output:
Current Map: Map(TestData(foo) -> bar)
Key sent into function: TestData(foo)
Key isn't found in Map:
java.util.NoSuchElementException: key not found: TestData(foo)
The code above is similar to what I'm trying to do, except the case class is more complicated and the HashMap has Lists as values. Also in the sample above, I'm using 'collect' so that the print statements are output. The sample still gives the same error without collect, but no prints.
The hashCodes match already, but I tried overriding equals and hashCode for the case class, same problem.
This is using Databricks, so I don't believe I have access to REPL or spark-submit.