4

I don't know how to store my hierarchical data in my innoDB-table.

I've read a lot about the disadvantages of the method of storing the parent_id in each row. But now the problem is, that I have a very large database (~50 million rows). The hierarchy is mostly not very deep (3-6 levels).

Many websites advise taking the "Nested Set Model" as a better alternative to the parent-id-storing-method. But there are always changes being made (UPDATE, INSERT etc.) by the users of the website and because of the size of my table, this would take too much time (since changes in the "Nested Set Model" have a very low performance).

So my question is: How do you store efficiently large hierarchical data with many update/insert commands? (Also blocking the whole table is not an option [-> innoDB-table])

Kate Gregory
  • 18,808
  • 8
  • 56
  • 85
atreju
  • 477
  • 4
  • 14
  • possible duplicate of [SQL - how to store and navigate hierarchies](http://stackoverflow.com/questions/38801/sql-how-to-store-and-navigate-hierarchies) – markus Jan 01 '13 at 21:24
  • This question and its answers should provide all answers to your question. – markus Jan 01 '13 at 21:24
  • @markus-tharkun: well, not really. I know that for smaller tables or tables with few update-commands, the Nested Set is probably the best solution. But my problem is the db-size and the large number of update-commands. The question is now, if the parent-id-storage-method is in this situation better than the Nested Set... – atreju Jan 01 '13 at 21:28
  • @markus-tharkun: have a look at the last comment of the top answer. This is exactly my problem. But in that post, no one responded to this comment. – atreju Jan 01 '13 at 21:32
  • 1
    The Adjacency List Model is faster if you do not have to check if the new parent is not also a child to this row. This would not happen with Nested Sets. – Sir Rufo Jan 01 '13 at 21:49

2 Answers2

2

The Nested Sets design is definitely difficult when you need to make frequent updates to the tree. You end up having to renumber large parts of the tree.

One suggestion for mitigating this is to use floating-point numbers instead of integers. If you insert a new node in the tree, it's relatively easy to find some FLOAT numbers in between the nested set numbers of the parent of the new node. You may eventually get to the limits of the precision of a floating-point number, but since your tree isn't very deep that won't happen for a long time.

Another technique which I have written about I call Closure Table. This method of storing hierarchies makes it much easier to insert/update/delete nodes in a large tree without needing to update a lot of your tree. And you can still query the whole tree or any subtree in a single non-recursive SQL query.

To read more about Closure Table, see:


Re your comment:

Adjacency List is simple, has a minimum of redundancy, and it supports FK relationships, which Nested Sets does not.  Adjacency List supports querying a whole tree of arbitrary depth if you use recursive queries. But MySQL does not support recursive queries.

If you need to query only immediate parent-child relationships (i.e. one level of depth), or otherwise query only trees of fixed depth, then Adjacency List is fine.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you for your answer. This really could be another possibility. But still I'm not sure why the Adjacency List Model isn't that good, as I have used Foreign Keys to relate the parent-row with the child-rows. So when I analize the SELECT-queries with `EXPLAIN SELECT`, there are only the number of selected child-nodes affected. This seems like the "perfect" solution. No? – atreju Jan 01 '13 at 22:16
1

For hierarchical data I like to keep the hierarchy separate. For instance, if we were dealing with an employee hierarchy, I generally do something like this -

create table employee (
    id serial primary key,
    name varchar(50));

create table roster (
    id serial primary key,
    employee_id int references employee (id),
    supervisor_id int references employee (id));

This can be extended to provide historical hierarchies by adding either a row_date or start_date and stop_date fields to the roster table.

Be sure that you have unique constraints and triggers applied where applicable to enforce business rules.

pyrospade
  • 7,870
  • 4
  • 36
  • 52