4

Possible Duplicate:
What is the most efficient/elegant way to parse a flat table into a tree?

This I am finding rather tricky and would like some opinions on the matter. I am trying to store hierarchal data (tree like) with an unknown number of levels and branches. I am wanting to be able to add new ones and delete any at any time.

I need to be able to query from any node in the hierarchy for all of the children id's in one go and efficiently due to large user base.

Lets take a hypothetical example of a website where families socialise and update their status like in facebook and at any time you can be viewing a family members "Wall" which will also include all of the recent status updates form the people below them in the hierarchy in chronological order.

Obviously the fetching posts once you have the array of family members id's who are children of this family members node is easy enough in a loop.

Lets take an example simple table structure of:

id  |  parentId  |  name
________________________

1   |    NULL    |  John
2   |     1      |  Peter
3   |     1      |  Bob
4   |     3      |  Emma
5   |     2      |  Sam
6   |     4      |  Gill

etc.... You get the idea.

I need to be able to do the above with something like this unless you think the structure needs to be adapted.

I have read up on mySql nested set model. This seems very fiddly and could be unreliable if something was not to update correctly and would mess everything up.

I am used to using php and mysql but have been reading a bit on cassandra and thrift. Not sure if this would be easier?

Community
  • 1
  • 1
Stefan P
  • 1,013
  • 2
  • 18
  • 34
  • 2
    I know it looks fiddly but nested set model is really what you want. Its harder to explain/describe the this than it is to implement, and the resulting SQLs are an order of magnitude simpler and better performing than ant parent child pointer solution – James Anderson Dec 14 '10 at 01:34

3 Answers3

1

There are already good approaches out there which are more simple than the solution you propose.

Here are a couple of links which explain how to do it (we use this ourselves for much the same problem you describe and it works well).

This makes inserting/updating more complex, but selecting portions of the tree structure far faster (with only one query). It allows finding all children of any given node in one query, and finding all the ancestors of a given node with one query.

El Yobo
  • 14,823
  • 5
  • 60
  • 78
  • Hey El Yobo, I am aware of this technique, I mentioned it in my post. Thanks for the heads up though and answer! If you think about it there is in fact only one query as well using the method I proposed. As all I need is an array of children Id's so I can then find relative post information from other tables. Am I not right? I suppose getting the family members name is also a query... hmmm... choices... Being able to add in data and delete is very important as well for my application as it needs to be able to handle lots of data in each hierarchy and I dont want to risk "corruption" – Stefan P Dec 14 '10 at 23:32
  • Also after looking further into the Nested method, it requires an extra sql query anyway to retrieve the root node's rgt and lft values. So it's splitting hairs where as with my proposed method there is no additional step. Also I would be looking to have the rgt and lft values start round different family trees in the same table, this would add further complications I believe due to additional conditional statements? It would be a bit unruly with 10,000+ nodes. – Stefan P Dec 14 '10 at 23:53
  • Ah, sorry, I didn't recognise the "nested set model" name :) It's really not very fiddly though; a little bit of work to set it up the first time, then it's all good. Your approach above seems much more complex to me. – El Yobo Dec 14 '10 at 23:58
  • I'm not sure what you mean about the extra query; to build an entire tree or subtree is a single query for us. Perhaps I'm misunderstanding what you're trying to do. – El Yobo Dec 15 '10 at 00:00
0

Yes, your solution is called a transitive closure. I have written about it before:

You also need the zero-length paths, e.g. 2-2, 3-3, 4-4.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

So I think I have come up with an idea.

The reason I am against the nested set model is because it seems like it is still not the best way and is not going to be the ideal performance solution.

I am going to cover a proposed solution I have been thinking about. The concept means creating an hierarchal map table to keep track of all the relationships between each family member/node.

The way it would work is:

Using map table structure of this:

id  |  fMemberId   |   parentid
=====================================
1   |      3       |       2  
2   |      4       |       3
3   |      4       |       2

1) As a new family member is created as a child of a parent we would take the parents id and create a new row in our family members table with the parent id set for future additional uses and functionality.

2) As this row is created we will create new rows with all of the parent id's for the new family member.

A quick way to do this would be to take the parent id from the new family member and do a query to the map table to find all the rows with the family member id the same as the new family members parent id and then store an array in php of the subsequent parent ids required for storing alongside the new family members id in the map table. This would then only require one sql query for grabbing all the parent id's for adding them rather than a number of queries based on the number of nodes

This would mean when we are viewing a family members feed of posts we would be able to query the db for simply the rows in the map table to get all the children id's of the current family member and subsequently query other tables for the post data.

The main trade off being the amount of potential storage required for this kind of system. However I believe reading speed would be quicker as there is no conditional SQL statements and also maybe just as quick to write to db in this way.

We could overcome this by using InnoDB's cluster id's assigning an initial family id index and creating a new table with the "next family members id" based on the family id.

Also reliability, if a row wasn't written it would be easy enough to add it in. It prevents having to continually edit rows just to create a member.

What are your thoughts on this?

So far this seems to be a good way in my opinion. Took a lot of thinking to get to here. I also believe it could maybe be improved with time and being able to store arrays of id's per member rather than all of them. Still trying to work that one out!

Stefan P
  • 1,013
  • 2
  • 18
  • 34