0

I'm relatively new to Spring and am hoping someone can help point me in the right direction here.

Imagine a to-do list app... Let's say I have a main entity called Task and a task can have n nested subtasks (each of which is a Task and can have n subtasks, etc.) I'm thinking that hte best way to represent this type of relationship is using a join table (see below for high level definition). The main issue here is circular references to the parent/child tasks. For example, if TaskA has a subtask, TaskB, and subtask TaskB has a subtask, TaskA.

I am wondering:

  1. Is there a way in spring boot / JPA to define some type of table validation or a SQL CHECK CONSTRAINT that will ensure that a set of task ids is not entered into the join table twice? E.g. in the above scenario an exception would be thrown b/c Set<TaskA, TaskB> already exists in the table.
  2. Is this the best approach to represent this type of object relationship? Is there a better way to do this with Spring Data / JPA given the requirements?

Thanks so much in advance!

@Entity
@Table(name = "task")
public class Task {
  ...
}

@Entity
@Table(name = "subtasks")
public class SubTasks {
  ...
  private Task parentTask;
  private Task childTask;
}
  • There are [couple of options to store hierarchal data in db](https://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database) – Andrey B. Panfilov Jul 28 '22 at 05:11
  • I don't see why you would have a join table instead of just a self reference to the parent. – Jens Schauder Jul 28 '22 at 07:11

1 Answers1

0

You can do this kind of thing either in the application or in the database.

Doing it in the application is typically easier for the average Java developer to get something that is working in most of the cases. But it can get really hard to get something that is really water tight and even more difficult to prove that it is water tight and I don't even consider mathematical proves here.

Here are the options that I see:

  1. During construction of the entities ensure that you never have a circular dependency in the first place. No special framework or library support needed. Just whenever the hierarchy changes travers it up and collect ids, ensuring you don't encounter any id twice.

  2. You can use Bean Validation to check for cycles before saving. You can create custom validators. Both approaches have the problem that it is possible to create cycles they don't detect when nodes get manipulated in distinct sessions: Both sessions load the disconnected nodes A and B. Session one connects A -> B. Session two does the opposite. Neither sees a cycle. Since both edit a different Task objects or even just create new Task instances even optimistic locking won't help. If you update all entities in the hierarchy though, e.g. by letting them contain the id of the top most parent, optimistic locking should prevent that problem.

  3. You could write a trigger on the database that obtains an exclusive lock on the database, and performs the check. With the exclusive lock it should be impossible to get any cycle past that check. Obviously this limits concurrency.

  4. If your database supports it you might be able to create a materialized view that gets updated on every change and define a check constraint on that. Using the CONNECT BY clause it should be possible to create a view for example that contains the top most parent for each entry and a flag if a cycle is found and then put a check constraint on that column. This probably should behave just as the previous variant, but might be easier to maintain than the previous solution.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348