Let's say I have a recursive case class (which means it reference itself):
case class Dummy(id: Int, children: List[Dummy])
I cannot use Spark default encoder with it, because the default encoder didn't support circular reference
java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class Dummy
So I need to use a custom encoder like kryo which supports recursive case class like below:
val myEncoder: Encoder[Dummy] = Encoders.kryo[Dummy]
val ds = spark.createDataset(Seq(Dummy(1, null)))(myEncoder)
ds.createOrReplaceTempView("myTable")
This works fine and I can operate one the Dataset well, but what I need is to write Spark SQL and operate the fields in case class like below:
%%sql
select id, size(children) as childCount from myTable
But I got this error:
org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input columns: [myTable.value]; line 1 pos 7;
Because the schema is:
value: binary (nullable = true)
Note: the problem seems like spark-using-recursive-case-class but not the same. I need a way to write SQL to operate on the case class instead of using dataset api.
I have already tried to use kryo solution, but it didn't work. Does Spark support deserialize case class automatically when writing SQL?