174

What would be the best way to implement a customizable tree data structure (meaning, a tree structure with an unknown number of level) in a database?

I've done this once before using a table with a foreign key to itself.

What other implementations could you see, and does this implementation make sense?

kaartic
  • 523
  • 6
  • 24
CodeMonkey1313
  • 15,717
  • 17
  • 76
  • 109
  • 4
    See: [What are the Options for Storing Hierarchical Data in a Relational Database?](http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database) – cbare Dec 18 '12 at 15:14
  • SQL Server (since 2008) offers the [hierarchyid data type](https://msdn.microsoft.com/en-us/library/bb677290.aspx) – BornToCode Jun 16 '15 at 08:39

5 Answers5

92

You mention the most commonly implemented, which is Adjacency List: https://blogs.msdn.microsoft.com/mvpawardprogram/2012/06/25/hierarchies-convert-adjacency-list-to-nested-sets

There are other models as well, including materialized path and nested sets: http://communities.bmc.com/communities/docs/DOC-9902

Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).

Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".

The main things to consider when choosing a model are:

1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.

2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.

3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.

JeremyDWill
  • 3,132
  • 21
  • 18
  • Hi, I am facing this exact same problem stated in the question and would like to ask you a question about the topics above. Considering a structure as in number one topic (organizational structured table (not employee structured) with ParentId referenced in the same table), I need to set who is the boss of a certain area. I will assign all the employees of that specific area directly to it. Where would you put the boss of that specific area? Inside the same area or one gorup above? My approach is to reference him/her to the group above, that gives me a better structure I think. Thanks. – Marcos Buarque Oct 09 '09 at 15:31
  • 1
    First link seems to be broken. – Jorge Leitao Oct 23 '13 at 12:19
66

Have a look at Managing Hierarchical Data in MySQL. It discusses two approaches for storing and managing hierarchical (tree-like) data in a relational database.

The first approach is the adjacency list model, which is what you essentially describe: having a foreign key that refers to the table itself. While this approach is simple, it can be very inefficient for certain queries, like building the whole tree.

The second approach discussed in the article is the nested set model. This approach is far more efficient and flexible. Refer to the article for detailed explanation and example queries.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
Ayman Hourieh
  • 132,184
  • 23
  • 144
  • 116
13

If you have to use Relational DataBase to organize tree data structure then Postgresql has cool ltree module that provides data type for representing labels of data stored in a hierarchical tree-like structure. You can get the idea from there.(For more information see: http://www.postgresql.org/docs/9.0/static/ltree.html)

In common LDAP is used to organize records in hierarchical structure.

yurilo
  • 3,079
  • 2
  • 23
  • 14
2

Having a table with a foreign key to itself does make sense to me.

You can then use a common table expression in SQL or the connect by prior statement in Oracle to build your tree.

Aaron Daniels
  • 9,563
  • 6
  • 45
  • 58
  • I have a log table, with a LogID identity column, and a ParentLogID column with a FK that points back to LogID column. When the first log row in a transaction is written, I grab SCOPE_IDENTITY(). All other log records get written with this value in the ParentLogID column. This is really useful for grouping rows that belong together. It is the only real way to see what happened, without this, it would be a huge mess of log rows from multiple transactions all mixed together. – KM. Jun 01 '09 at 15:22
  • @KM - He said "does make sense" not "doesn't make sense" – John Rasch Jun 01 '09 at 16:39
0

If anyone using MS SQL Server 2008 and higher lands on this question: SQL Server 2008 and higher has a new "hierarchyId" feature designed specifically for this task.

More info at https://learn.microsoft.com/en-us/sql/relational-databases/hierarchical-data-sql-server

Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149