Yesterday (practically the full journal) I tried to figure out an elegant way to represent a model with circular references in Scala/Spark SQL 2.2.1
Let's say that this is the original model approach that, of course, it is not working (keep in mind that the real model has tens of attributes):
case class Branch(id: Int, branches: List[Branch] = List.empty)
case class Tree(id: Int, branches: List[Branch])
val trees = Seq(Tree(1, List(Branch(2, List.empty), Branch(3, List(Branch(4, List.empty))))))
val ds = spark.createDataset(trees)
ds.show
And this is the error that it throws:
java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class Branch
I know that the maximum hierarchy level that we have is 5. So, as a workaround, I though in something like:
case class BranchLevel5(id: Int)
case class BranchLevel4(id: Int, branches: List[BranchLevel5] = List.empty)
case class BranchLevel3(id: Int, branches: List[BranchLevel4] = List.empty)
case class BranchLevel2(id: Int, branches: List[BranchLevel3] = List.empty)
case class BranchLevel1(id: Int, branches: List[BranchLevel2] = List.empty)
case class Tree(id: Int, branches: List[BranchLevel1])
Of course, it is working. But this is not elegant at all and you can imagine the pain around the implementation (readability, coupling, maintenance, usability, duplication of code, etc.)
So the question is, how to handle cases with circular references in the model?