0

Let's say I have a recursive case class (which means it reference itself): case class Dummy(id: Int, children: List[Dummy]) I cannot use Spark default encoder with it, because the default encoder didn't support circular reference

java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class Dummy

So I need to use a custom encoder like kryo which supports recursive case class like below:

val myEncoder: Encoder[Dummy] = Encoders.kryo[Dummy]
val ds = spark.createDataset(Seq(Dummy(1, null)))(myEncoder)
ds.createOrReplaceTempView("myTable")

This works fine and I can operate one the Dataset well, but what I need is to write Spark SQL and operate the fields in case class like below:

%%sql
select id, size(children) as childCount from myTable

But I got this error:

org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input columns: [myTable.value]; line 1 pos 7;

Because the schema is:

value: binary (nullable = true)

Note: the problem seems like spark-using-recursive-case-class but not the same. I need a way to write SQL to operate on the case class instead of using dataset api.

I have already tried to use kryo solution, but it didn't work. Does Spark support deserialize case class automatically when writing SQL?

lty
  • 108
  • 1
  • 6
  • why you want circular reference, why not just create another case class. – Gaurang Shah Jul 05 '23 at 15:08
  • This is a tree structure, so circular reference is necessary. Another approach is replace children: List[Dummy] to childIds: List[Int], then have another map to map id to element. But it will be hard to use. – lty Jul 06 '23 at 05:34

0 Answers0