0

I've got a pretty simple tree implementation in SQL:

CREATE TABLE [dbo].[Nodes] (
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](max) NULL
);

CREATE TABLE [dbo].[NodeNodes] (
    [ParentNodeId] [int] NOT NULL,
    [ChildNodeId] [int] NOT NULL
);

My tree implementation is such that a node can have multiple parents. This is so the user can create custom trees that group together commonly used nodes. For example:

      1              8         9
    /   \           / \       / \
   2     3         4   7     2   6
  / \   / \                 / \
 4   5 6   7               4   5

Node | Parents | Children
---------------------------
  1  |    -    |   2,3
  2  |   1,9   |   4,5
  3  |    1    |   6,7
  4  |   2,8   |    -
  5  |    2    |    -
  6  |   3,9   |    -
  7  |   3,8   |    -
  8  |    -    |   4,7
  9  |    -    |   2,6

So there are three trees which are indicated by the three nodes with no parent. My problem is validating a potential relationship when the user adds a node as a child of another. I would like no node to appear twice in the same tree. For example, adding node 2 as a child of node 6 should fail because that would cause node 2 to appear twice in 1's tree and 9's tree. I'm having trouble writing an efficient algorithm that does this.

My first idea was to find all the roots of the prospective parent, flatten the trees of the roots to get one list of nodes per tree, then intersect those lists with the prospective child, and finally pass the validation only if all of the resultant intersected lists are empty. Going with the example, I'd get these steps:

1) Trace prospective parent through all parents to roots:
6->3->1
6->9

2) Flatten trees of the roots
1: {1,2,3,4,5,6,7}
9: {2,4,5,6,9}

3) Intersect lists with the prospective child
1: {1,2,3,4,5,6,7}^{2} = {2}
9: {2,4,5,6,9}^{2} = {2}

4) Only pass if all result lists are empty
1: {2} != {} ; fail
9: {2} != {} ; fail

This process works, except for the fact that it requires putting entire trees into memory. I have some trees with 20,000+ nodes and this takes almost a minute to run. This performance isn't a 100% dealbreaker, but it is very frustrating. Is there a more efficient algorithm to do this?

Edit 4/2 2pm

The above algorithm doesn't actually work. deroby pointed out that adding 9 as a child to 7 will be passed by the algorithm but shouldn't be. The problem is that adding a node with children to another node will succeed as long as the node isn't repeated -- it doesn't validate the children.

Community
  • 1
  • 1
ptorian
  • 1
  • 2
  • Just to be certain I understand things correctly : adding 9 as a child to 7 would thus be OK in the example ? – deroby Apr 02 '12 at 16:48
  • Adding 9 as a child to 7 violates what I'm trying to do because all of 9's children end up getting repeated, but it isn't caught using the algorithm above. So it looks like the above algorithm works fine if you're adding leaf nodes, but if you're trying to add nodes with children, it doesn't validate the children. – ptorian Apr 02 '12 at 18:01
  • When you say "node 2", do you mean "a row in the Nodes table with an Id (column) with a value of 2"? – Lasse V. Karlsen Apr 02 '12 at 18:13
  • Yeah, all the numbers refer to the node ids. – ptorian Apr 02 '12 at 18:17
  • I'm wondering if maybe there is something wrong with the solution -- what problem are you trying to solve? – Hogan Apr 02 '12 at 20:55

1 Answers1

0

A year later I stumbled upon my own question and I decided I would add my solution. It turns out I had just forgotten my basic data structures. What I originally thought was a simple tree was actually a directed graph, and what I was testing for was a cycle. Seeing as how cycle detection is a pretty common thing, there should be numerous solutions and discussions about it out there on the internets. See Best algorithm for detecting cycles in a directed graph for one example.

Community
  • 1
  • 1
ptorian
  • 1
  • 2