1

How do you enable the encoding of scala.Symbols in Spark Datasets?

The following small example:

import org.apache.spark.sql.SparkSession

object DatasetTest extends App {

  val spark: SparkSession = SparkSession
    .builder()
    .master("local[*]")
    .getOrCreate()

  import spark.implicits._
  case class RowType(symbol: Symbol, string: String)

  val key = Symbol("123")
  val ds1 = Seq(RowType(key, "some data")).toDS
  val ds2 = Seq(RowType(key, "other data")).toDS

  ds1.joinWith(ds2, ds1.col("symbol") === ds2.col("symbol")).show
}

Will fail at runtime with:

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for Symbol
- field (class: "scala.Symbol", name: "symbol")
- root class: "DatasetTest.RowType"

And all my attempts to include an implicit encoder failed.

For example when using Kryo, the datasets where created, but the joinWith would then fail finding the "symbol" column. (it seems that spark is not able to destructure the type when using Kryo)

Any hints?

Please note that this question is much more specific than: How to store custom objects in Dataset? and is expecting a more specific answer.

baol
  • 4,362
  • 34
  • 44

0 Answers0