ClassCastException when using Spark Dataset API+case class+Spark Job Server

Question

I'm getting weird error whenever I re-create (delete and create context) the Spark SQL Context and run the job for 2nd time or after it will always throw this exception.

[2016-09-20 13:52:28,743] ERROR .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/ctx] - Exception from job 23fe1335-55ec-47b2-afd3-07396483eae0:
java.lang.RuntimeException: Error while encoding: java.lang.ClassCastException: org.lala.Country cannot be cast to org.lala.Country
staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString,invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String)),true) AS code#10
+- staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString,invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String)),true)
   +- invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String))
      +- input[0, ObjectType(class org.lala.Country)]

        at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:220)
        at org.apache.spark.sql.SQLContext$$anonfun$8.apply(SQLContext.scala:504)
        at org.apache.spark.sql.SQLContext$$anonfun$8.apply(SQLContext.scala:504)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:504)
        at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:141)
        at org.lala.HelloJob$.runJob(HelloJob.scala:18)
        at org.lala.HelloJob$.runJob(HelloJob.scala:13)
        at spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301)

My Spark Class :

case class Country(code:String)

object TestJob extends SparkSqlJob {
  override def runJob(sc: SQLContext, jobConfig: Config): Any = {
    import sc.implicits._

    val country = List(Country("A"),Country("B"))
    val countryDS = country.toDS()
    countryDS.collect().foreach(println)
  }

  override def validate(sc: SQLContext, config: Config): SparkJobValidation = {
    SparkJobValid
  }
}

I'm using:

Spark 1.6.1
Spark Job Server 0.6.2 (docker)

No, it seems SJS doesn't really support DataSet API, so I switched back my code to use DataFrame API. — NoodleX, Feb 03 '17 at 10:00
I was trying this from Jupyter notebook after installing Apache Toree kernel. Is there some way I can know if I was using SJS. Because I think Jupyter uses spark-shell. Aside question - have you tried this in spark-shell? Anyway it seems I have to go back to DF api as well. So much for type safety — faizan, Feb 03 '17 at 10:13
I am also facing this issue. I have a similar spark code which runs on spark-shell, but `Jupyter` with `Apache Toree`. still no workaround? — Ahmet DAL, Jul 22 '17 at 22:29

ClassCastException when using Spark Dataset API+case class+Spark Job Server

0 Answers0