0

I am encountering this error: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to [Ljava.lang.Object; whenever I try to use "contains" to find if a string is inside an array. Is there a more appropriate way of doing this? Or, am I doing something wrong? (I am fairly new to Scala)

Here is the code:

val matches = Set[JSONObject]()
val config = new SparkConf()
val sc = new SparkContext("local", "SparkExample", config)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val ebay = sqlContext.read.json("/Users/thomassquires/Downloads/products.json")
val catalogue = sqlContext.read.json("/Users/thomassquires/Documents/catalogue2.json")

val eins = ebay.map(item => (item.getAs[String]("ID"), Option(item.getAs[Set[Row]]("itemSpecifics"))))
  .filter(item => item._2.isDefined)
  .map(item => (item._1 , item._2.get.find(x => x.getAs[String]("k") == "EAN")))
  .filter(x => x._2.isDefined)
  .map(x => (x._1, x._2.get.getAs[String]("v")))
  .collect()

    def catEins =  catalogue.map(r => (r.getAs[String]("_id"), Option(r.getAs[Array[String]]("item_model_number")))).filter(r => r._2.isDefined).map(r => (r._1, r._2.get)).collect()

  def matched = for(ein <- eins) yield (ein._1, catEins.filter(z => z._2.contains(ein._2)))

The exception occurs on the last line. I have tried a few different variants.

My data structure is one List[Tuple2[String, String]] and one List[Tuple2[String, Array[String]]] . I need to find the zero or more matches from the second list that contain the string.

Thanks

Reid Spencer
  • 2,776
  • 28
  • 37
Tom Squires
  • 8,848
  • 12
  • 46
  • 72
  • Is there any specific reason you are collecting and then filtering? Because ideally you should always collect your end result. – Abhishek Anand Mar 21 '16 at 15:06
  • Mainly to pin down the error. Since its lazy i only get the error on collect. I wanted to rule out errors on the first two sets – Tom Squires Mar 21 '16 at 15:29
  • Try annotating types of all vals, it will also help others to reason about your code. Btw why `matched` and `catEins` are `def`s instead of `val`s? – Łukasz Mar 21 '16 at 15:37

1 Answers1

2

Long story short (there is still part that eludes me here*) you're using wrong types. getAs is implemented as fieldIndex (String => Int) followed by get (Int => Any) followed by asInstanceOf.

Since Spark doesn't use Arrays nor Sets but WrappedArray to store array column data, calls like getAs[Array[String]] or getAs[Set[Row]] are not valid. If you want specific types you should use either getAs[Seq[T]] or getAsSeq[T] and convert your data to desired type with toSet / toArray.


* See Why wrapping a generic method call with Option defers ClassCastException?

Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935