What is the most efficient way of storing complex, self-referencing tree structures in an SQL database?

Question

I'm aware of and have used two methods in the past for basic tree structures: adjacency lists and nested sets. I understand several pros and cons to these approaches - e.g., that adjacency lists are quick to update but slow to query, that nested sets are the opposite (slow to update, quick to query).

However, I require the ability to store more complex tree-like structures. The best way to describe this is using human family relationships. My first thought was that each element could have both a "ancestors" tree and a "descendants" tree. However, there will be significant redudancy in this approach as, using the example below, both Cameron and Kelly will share all of Bob's ancestor tree (and updates will be even more time consuming, as an insertion to the tree will actually have to insert into multiple trees). My second thought was to include tree references. E.g., say that Alice has her own ancestor tree. Both the (4,5) element coming from Cameron's ancestor tree, and the (2,3) element coming from Kelly's ancestor tree would simply reference Alice's ancestor tree. This second approach requires less data storage, would experience faster updates (only an update to a single tree vs. multiple trees) and would retain the speed advantages of querying a large tree structure (although the SQL to query such a self-referencing nested set is rather complicated). However, a drawback to the second approach is that the data becomes "fragmented" (much like inodes on a hard drive).

 A = Alice                                  
 B = Bob                                    
 C = Cameron                                
 J = John                                   
 K = Kelly                                  

+------------------------------------------+

  +-----------+            +-------------+  
  |(2) Bob (3)|            |(4) Alice (5)|  
  +-----+-----+            +------+------+  
        |                         |         
        |                         |         
        |                         |         
        |                         |         
        |    +---------------+    |         
        +----+(1) Cameron (6)+----+         
             +---------------+              

  +-------------+            +------------+ 
  |(2) Alice (3)|            |(4) John (5)| 
  +-----+-------+            +-------+----+ 
        |                            |      
        |                            |      
        |                            |      
        |                            |      
        |      +-------------+       |      
        +------+(1) Kelly (6)+-------+      
               +-------------+              

+------------------------------------------+

           +-------------+                  
           |(1) Alice (6)+----------+       
           +-+-----------+          |       
             |                      |       
             |                      |       
             |                      |       
   +---------+-----+        +-------+-----+ 
   |(2) Cameron (3)|        |(4) Kelly (5)| 
   +---------------+        +-------------+

For the second approach, I am visualising multiple nested sets stacked in front of one another, with certain nodes "drawing a line" along the z-index to a node on another plane.

Note that the above is just an example - I'm not actually storing human relationships, but I am storing complex tree like data. There are many reasons to store such complex hierarchical structures, so I'll let you imagination run wild!

The question: What is the most efficient way performance wise (updates and selects) of storing complex, self-referencing tree structures in an SQL database? I'm specifically referring to PostgreSQL, but if you have alternatives (even to SQL itself), I'm open to hearing that as well.

possible duplicate of [What are the Options for Storing Hierarchical Data in a Relational Database?](http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database) — Matteo Tassinari, Jun 26 '14 at 06:41
Are you using a DBMS that supports recursive queries (so essentially anything but MySQL?) — , Jun 26 '14 at 06:51
Updated question to reference PostgreSQL as the database in use. — magnus, Jun 26 '14 at 06:52

score 0 · Answer 1 · answered Jun 26 '14 at 06:49

0

In Oracle this can be achieve by using start with .. connect by. Also known as Hierarchical Queries.

for example:

select * 
 from <table_name> 
 start with <parent_column_name> is null
 connect by prior <child_column_name> = <parent_column_name>;

answered Jun 26 '14 at 06:49

Charlesliam

1,293
3
20
36

What is the most efficient way of storing complex, self-referencing tree structures in an SQL database?

1 Answers1