34

I'm starting a project and I'm in the designing phase: I.e., I haven't decided yet on which db framework I'm going to use. I'm going to have code that creates a "forest" like structure. That is, many trees, where each tree is a standard: nodes and edges. After the code creates these trees I want to save them in the db. (and then pull them out eventually)

The naive approach to representing the data in the db is a relational db with two tables: nodes and edges. That is, the nodes table will have a node id, node data, etc.. And the edges table will be a mapping of node id to node id.

Is there a better approach? Or given the (limited) assumptions I'm giving this is the best approach? How about if we add an assumption that the trees are relatively small - is it better to save the whole tree as a blob in the db? Which type of db should I use in that case? Please comment on speed/scalability.

Thanks

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Guy
  • 14,178
  • 27
  • 67
  • 88
  • I've been thinking about this too and I thought of storing the array of parents of current node. I haven't tested it out but would like to know if that is an optimal solution. – KJW Jul 04 '11 at 04:52
  • It sounds like you aren't using the database as a database. Typically trees are used to accomplish some other abstract goal, such as the creation of sets or maps/dictionaries. I think you need to figure out the abstract data structure you are using and ask how to map *that* to a database. What are you storing in your forest? – 101100 Jul 04 '11 at 04:53
  • 1
    What sort of queries do you need to support? You could use adjacency lists (your current idea), nested sets, or even (with appropriate database support) arrays of node ids to represent the path from the root. Which representation you choose depends on what you need to do to your data. – mu is too short Jul 04 '11 at 05:06
  • @Kim, why array? If the graph is a tree, each node has at most one parent. – svick Jul 04 '11 at 12:30
  • @svick, so it's easier to filter sub trees based on ancestors. – KJW Jul 04 '11 at 21:43

2 Answers2

23

I showed a solution similar to your nodes & edges tables, in my answer to the StackOverflow question: What is the most efficient/elegant way to parse a flat table into a tree? I call this solution "Closure Table".

I did a presentation on different methods of storing and using trees in SQL, Models for Hierarchical Data with SQL and PHP. I demonstrated that with the right indexes (depending on the queries you need to run), the Closure Table design can have very good performance, even over large collections of edges (about 500K edges in my demo).

I also covered the design in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Lets say I decide to keep a datastructure that represents the tree in memory (since I have many trees with keys pointing to them. i.e., key-->tree). For example, I can make a JSON out of my tree and save it as string. Am I better off using a non-relational db in this case? and (since I've never used a non-relational db) which non-relational db can you recommend? – Guy Jul 06 '11 at 11:07
  • 1
    You can sure stuff a JSON string into a non-relational db, but you won't be able to do anything with it until you pull it out of the db. If you would prefer to fetch the whole tree, deserialize it to application objects, and write recursive functions to walk the tree, that's fine. If you want a query language with which you can fetch subtrees or paths through the tree, use relational modeling. – Bill Karwin Jul 06 '11 at 11:59
  • @Bill Karwin See http://stackoverflow.com/questions/13302187/sql-query-optimize related to your slide. – Mohammed H Nov 09 '12 at 04:38
  • i do not understand how does the closure table keep the structure data of the selected sub-tree. you just get "a bunch of nodes having the same parent", no information on the structure itself with selects – user151496 Apr 21 '15 at 13:10
  • 1
    @user151496, if you query a subtree starting at node C, you get all the set of all nodes below C. You're right that alone is not enough to show the hierarchy. But you can join those descendent nodes to path segments connecting to their ancestors or descendants. I show examples in my presentation that I linked to. – Bill Karwin Apr 21 '15 at 17:03
  • i probably didn't undestand the query properly then :( do you mean the query at page 49? because it seems like it does not query the structure. nor the latter querry that selects immediate parent/child – user151496 Apr 22 '15 at 09:17
  • @user151496, right, the query on slide 49 queries the set of descendants of node #4, but it doesn't show how those nodes relate to each other. The query for immediate parent returns only one node, so you know which one it is. The query for immediate children return several, but you know they're all siblings of each other. – Bill Karwin Apr 22 '15 at 15:45
  • @BillKarwin And if it is saved in an XML structure? – Gus May 04 '16 at 20:28
  • 1
    @Gus, If you have an XML document, then store it in a `TEXT` column. If you want to be able to query for individual elements, you might be able to use builtin XML functions, for example for MySQL: http://dev.mysql.com/doc/refman/5.7/en/xml-functions.html. But that won't be indexed. If you want high-performance queries, break up the XML and store the elements in discrete rows. – Bill Karwin May 04 '16 at 23:48
1

Be sure to use some sort of low level-coding for the entity being treed to prevent looping. The entity might be a part, subject, folder, etc.

With an Entity file and and Entity-Xref file you can loop through one of say two relationships between the two files, a parent and a child relation.

A level is the level an entity found in a tree. A low-level-code for the entity is the lowest level an entity is found in any tree anywhere. Check to make sure the low level code of the entity you want to make a child is less than or equal to prevent a loop. after adding an entity as a child it will become at least one level lower.

Tuxdude
  • 47,485
  • 15
  • 109
  • 110
eric
  • 11
  • 1