I've got a pretty simple tree implementation in SQL:
CREATE TABLE [dbo].[Nodes] (
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL
);
CREATE TABLE [dbo].[NodeNodes] (
[ParentNodeId] [int] NOT NULL,
[ChildNodeId] [int] NOT NULL
);
My tree implementation is such that a node can have multiple parents. This is so the user can create custom trees that group together commonly used nodes. For example:
1 8 9
/ \ / \ / \
2 3 4 7 2 6
/ \ / \ / \
4 5 6 7 4 5
Node | Parents | Children
---------------------------
1 | - | 2,3
2 | 1,9 | 4,5
3 | 1 | 6,7
4 | 2,8 | -
5 | 2 | -
6 | 3,9 | -
7 | 3,8 | -
8 | - | 4,7
9 | - | 2,6
So there are three trees which are indicated by the three nodes with no parent. My problem is validating a potential relationship when the user adds a node as a child of another. I would like no node to appear twice in the same tree. For example, adding node 2 as a child of node 6 should fail because that would cause node 2 to appear twice in 1's tree and 9's tree. I'm having trouble writing an efficient algorithm that does this.
My first idea was to find all the roots of the prospective parent, flatten the trees of the roots to get one list of nodes per tree, then intersect those lists with the prospective child, and finally pass the validation only if all of the resultant intersected lists are empty. Going with the example, I'd get these steps:
1) Trace prospective parent through all parents to roots:
6->3->1
6->9
2) Flatten trees of the roots
1: {1,2,3,4,5,6,7}
9: {2,4,5,6,9}
3) Intersect lists with the prospective child
1: {1,2,3,4,5,6,7}^{2} = {2}
9: {2,4,5,6,9}^{2} = {2}
4) Only pass if all result lists are empty
1: {2} != {} ; fail
9: {2} != {} ; fail
This process works, except for the fact that it requires putting entire trees into memory. I have some trees with 20,000+ nodes and this takes almost a minute to run. This performance isn't a 100% dealbreaker, but it is very frustrating. Is there a more efficient algorithm to do this?
Edit 4/2 2pm
The above algorithm doesn't actually work. deroby pointed out that adding 9 as a child to 7 will be passed by the algorithm but shouldn't be. The problem is that adding a node with children to another node will succeed as long as the node isn't repeated -- it doesn't validate the children.