How to use Spark temp view with column as recursive case class

Question

Let's say I have a recursive case class (which means it reference itself): case class Dummy(id: Int, children: List[Dummy]) I cannot use Spark default encoder with it, because the default encoder didn't support circular reference

java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class Dummy

So I need to use a custom encoder like kryo which supports recursive case class like below:

val myEncoder: Encoder[Dummy] = Encoders.kryo[Dummy]
val ds = spark.createDataset(Seq(Dummy(1, null)))(myEncoder)
ds.createOrReplaceTempView("myTable")

This works fine and I can operate one the Dataset well, but what I need is to write Spark SQL and operate the fields in case class like below:

%%sql
select id, size(children) as childCount from myTable

But I got this error:

org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input columns: [myTable.value]; line 1 pos 7;

Because the schema is:

value: binary (nullable = true)

Note: the problem seems like spark-using-recursive-case-class but not the same. I need a way to write SQL to operate on the case class instead of using dataset api.

I have already tried to use kryo solution, but it didn't work. Does Spark support deserialize case class automatically when writing SQL?

why you want circular reference, why not just create another case class. — Gaurang Shah, Jul 05 '23 at 15:08
This is a tree structure, so circular reference is necessary. Another approach is replace children: List[Dummy] to childIds: List[Int], then have another map to map id to element. But it will be hard to use. — lty, Jul 06 '23 at 05:34

How to use Spark temp view with column as recursive case class

0 Answers0