4

There seem to be several good options to represent hierarchical data in a database, most popular apparently beeing the tree traversal algorithms.

Another option that would probably work in my case is to do it recursively. This could involve saving the parent-id and going from there - though this too needs some sort of direction.

Right now I have a problem where I have a set of items which can be characterized by a chart of connections, yet there is no root and not necessarily a starting point. For example it may happen that items loop around themselves so the ordering is only element per element and not complete. Wether or not the ordering is "parent" or "child" depends on which direction you start in, so to say.
Additionally, each connection should be characterizable by several properties, so the connections need to be identifyable somehow.
Here is an example though note that it is only a small example but you can already see that a simple traversal algorithm might start at 254 but then from going to 203 to 162 then 254 is actually the child to 162 - which maybe a problem (I am far from beeing a cs major so I honestly don't know) example

Another thing is that I am limited to Access, which means I am pretty much limited to your standard SQL commands without recursiveness or functions in SQL.
For example many algorithms in SQL that convert to a Left/Right traversal tree on the fly do not work with Access SQL.
I have a large interest in solving this without much dependence on VBA as well.

As far as performance goes, I expect less that 5000 items though the queries concerning properties of elements and their connections might have tens of elements. The database would be used by less than 10 users simultaneously at first though these things tend to expand quickly here if they work well.

So, how would you implement this construct?

Community
  • 1
  • 1
IMA
  • 261
  • 2
  • 10
  • I confess I haven't looked at your issue in detail. However, have you read Joe Celko on Nested Sets? Might give you some ideas. http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html – Vinny Roe Jan 25 '13 at 09:12
  • 1
    I have read that particular site, yes. I know he has a book out which I don't have. Maybe I am completely wrong but I believe I will run into trouble with my deal because of how undefined by order of elements is. – IMA Jan 25 '13 at 09:22
  • What about the direction to seperate the chart (by algorithm) into parts that can be rendered by a tree traversal? There's gotta be a way to save this information. I mean I can do a regular n:n type relationship table but then I still need an algorithm to spew out the relevant connections for any one element. – IMA Jan 25 '13 at 12:06
  • Is this a tree or a web? It looks multipathed to me. – Walter Mitty Jan 25 '13 at 12:26
  • It is in that it has cycles but each pair of items only have one connection (but possibly in both direction). I guess it is not a tree then? I have no idea what I am doing ;) – IMA Jan 25 '13 at 12:42
  • If item A is connected to item B, does that always mean that item B is connected to item A? Also, can a connection A->B ever have properties that are different from B->A? I'm trying to ascertain whether this graph is directed and whether properties are direction-specific. Also, are properties defined in advance (so they can be represented by a bunch of fields), or you need the ability to add them dynamically? – Branko Dimitrijevic Jan 25 '13 at 17:32
  • 2
    `... ordering is "parent" or "child" depends on which direction you start in ...` -- so it is not a directed graph; hence not a tree nor hierarchy. Start here, http://techportal.inviqa.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/ – Damir Sudarevic Jan 25 '13 at 17:53
  • you are right, it is not a directed graph. Very helpful that link, much thanks! – IMA Jan 25 '13 at 20:50

1 Answers1

1

I've used Joe Celko's nested sets approach. It works extremely well in the right situations. This is not one of those situations.

A much more flexible approach and the one I would recommend you use is what Bill Karwin calls a closure table.

The basic idea is that you have one record for every possible path. Bill suggests two fields, ancestor_id and descendant_id. It is unclear from your diagram if the ancestor/descendant paradigm really applies in your case.

I also find adding at least one more field for the number of hops between nodes is useful. I would adapt Bill's method by creating a table with three fields:

  1. NodeA
  2. NodeB
  3. Hops

Here is some sample data for your diagram:

NodeA   NodeB  Hops
------  ------ ----
tog171  tog171  0
tog171  abb521  1
abb521  tog171  1
tog171  tog226  2
tog226  tog171  2
tog171  tog218  3
tog218  tog171  3

If there is some semantic meaning to the different colored lines and solid vs. dashed lines then an additional field that captures that semantic meaning could also be added to your table.

You do end up with a lot of entries in your table, but the flexibility is nearly limitless. And in looking at your diagram, flexibility appears to be your biggest need.

EDIT: That first row in my sample data with 0 hops is actually a technique I learned from PJ Eby in his blog post The simplest(?) way to do tree-based queries in SQL. The purpose of those nodes is to make insertion and deletion of nodes simpler. I highly recommend that page for a detailed overview of implementing closure tables.

I think PJ Eby's page is actually a better resource for writing to the closure table, while Bill Karwin's answer has some great examples of reading from the table.

Community
  • 1
  • 1
mwolfe02
  • 23,787
  • 9
  • 91
  • 161
  • I am gonna try to get that to work in pure access sql but thats the second time Bill indirectly gave me db advise - so I will buy his book ;) Anyway, this is what I have been looking for. Thank you mwolfe02 – IMA Jan 25 '13 at 20:53
  • @IMA: I just added a link to another blog post on implementing closure tables. I think that is actually a better "how-to" piece than Karwin's answer. – mwolfe02 Jan 25 '13 at 21:09