4

Suppose a tree structure is implemented in SQL like this:

CREATE TABLE nodes (
    id INTEGER PRIMARY KEY,
    parent INTEGER -- references nodes(id)
);

Although cycles can be created in this representation, let's assume we never let that happen. The table will only store a collection of roots (records where parent is null) and their descendants.

The goal is to, given an id of a node on the table, find all nodes that are descendants of it.

A is a descendant of B if either A's parent is B or A's parent is a descendant of B. Note the recursive definition.

Here is some sample data:

INSERT INTO nodes VALUES (1, NULL);
INSERT INTO nodes VALUES (2, 1);
INSERT INTO nodes VALUES (3, 2);
INSERT INTO nodes VALUES (4, 3);
INSERT INTO nodes VALUES (5, 3);
INSERT INTO nodes VALUES (6, 2);

which represents:

1
`-- 2
    |-- 3
    |   |-- 4
    |   `-- 5
    |
    `-- 6

We can select the (immediate) children of 1 by doing this:

SELECT a.* FROM nodes AS a WHERE parent=1;

We can select the children and grandchildren of 1 by doing this:

SELECT a.* FROM nodes AS a WHERE parent=1
UNION ALL
SELECT b.* FROM nodes AS a, nodes AS b WHERE a.parent=1 AND b.parent=a.id;

We can select the children, grandchildren, and great grandchildren of 1 by doing this:

SELECT a.* FROM nodes AS a WHERE parent=1
UNION ALL
SELECT b.* FROM nodes AS a, nodes AS b WHERE a.parent=1 AND b.parent=a.id
UNION ALL
SELECT c.* FROM nodes AS a, nodes AS b, nodes AS c WHERE a.parent=1 AND b.parent=a.id AND c.parent=b.id;

How can a query be constructed that gets all descendants of node 1 rather than those at a fixed depth? It seems like I would need to create a recursive query or something.

I'd like to know if such a query would be possible using SQLite. However, if this type of query requires features not available in SQLite, I'm curious to know if it can be done in other SQL databases.

Joey Adams
  • 41,996
  • 18
  • 86
  • 115

3 Answers3

9

Some databases allow that using recursive common table expressions, but not SQLite.

You could consider changing your table definition. With a table like this, it's easy to query all descendants of 1:

id (varchar)
--------------
001
001002
001002003
001002003004
001002003005
001002006

This allows you to query all descendants of 1 like:

SELECT * FROM YourTable WHERE id LIKE '001%'

It sounds a bit whacky, but works very well in practice.

Shahood ul Hassan
  • 745
  • 2
  • 9
  • 19
Andomar
  • 232,371
  • 49
  • 380
  • 404
  • Definitely wacky, definitely awesome! Also, what about delimiting IDs with slashes? (e.g. "1/2/3/4") – Joey Adams Apr 30 '10 at 04:30
  • I like this, it's much simpler than my suggestion, though perhaps not quite as versatile. I guess it depends on your actual needs, though... – Dean Harding Apr 30 '10 at 04:36
  • its basically a url :) you could just use a url if you wanted and use the same technique. also could put a marker in for leaf nodes if you happened to want to query for those – Keith Nicholas Apr 30 '10 at 04:54
  • My program stores and deduplicates files (roughly a few million) at the block level in an archive. Both insertion and lookup need to be fast, so I think a solution like this one would be best. – Joey Adams Apr 30 '10 at 04:56
6

The way you've set up your schema doesn't really suit itself very well to the relational model (as you've discovered). But there is another way that may not be so obvious at first, but is much more flexible. It's called a "nested set model" and it just structures things slightly differently.

Managing Hierarchical Data in MySQL (look for the section "The Nested Set Model") shows how to do this in MySQL, but it should translate to SQLite fairly easily. I won't go into too much detail here, because that article is actually pretty long, but it should give you all you need to get going.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
Dean Harding
  • 71,468
  • 13
  • 145
  • 180
  • I believe the relational model is quite suitable. I defined `descendant` using strictly first-order logic, if I'm not mistaken :-) – Joey Adams Apr 30 '10 at 04:21
  • The solution in the link is fixed depth, while the question asks for any depth. Only difference with the query in the question is the use of `left join` instead of `union all` – Andomar Apr 30 '10 at 04:23
  • @Andomar: I don't think you read the whole linked article... start from the section titled "The Nested Set Model" – Dean Harding Apr 30 '10 at 04:28
  • Wow... that would give high cost updates! But it's definitely any depth, +1 :) – Andomar Apr 30 '10 at 04:45
  • 1
    Yeah, updates/inserts/deletes are pretty expensive. I guess it depends on your typical usage. For mostly-read data it's actually surprisingly quick, and flexible. This would be well-suited for large datasets, whereas I'd choose yours if the dataset was smaller. – Dean Harding Apr 30 '10 at 04:48
1

Oracle has a CONNECT BY syntax that could be used for this, but of course it's specific to oracle.

MeBigFatGuy
  • 28,272
  • 7
  • 61
  • 66
  • @Andomar: As I said in the original post: "However, if this type of query requires features not available in SQLite, I'm curious to know if it can be done in other SQL databases." – Joey Adams Apr 30 '10 at 04:32