8

I have a tree hierarchy look this built into a table with the parent_id pointing to the previous root node.

I am iterating through all root nodes (root1, root2) and I am setting path to either root1 or root1/child1 for root1 and child1. In order to find the path for child1, I will have to make at-least 2 calls to form the path. Is there an efficient way to fill the path, since we deal with a very large number of root nodes and children which are nested 5-7 levels deep.

create table foo (id, name, parent_id, path)
insert into foo (1, "root1', null, null)
insert into foo (2, "child1', 1, null)

root1 (path = null)
  child1 (path = root1)
    subchild1 (path = root1/child1)

root2
   child2
     subchild2
Meherzad
  • 8,433
  • 1
  • 30
  • 40
Sam
  • 8,387
  • 19
  • 62
  • 97

5 Answers5

11

You can go with a stored procedure as you have mentioned in your question as the nesting can be up to 7 level deep.

Stored Procedure

CREATE PROCEDURE updatePath()
BEGIN
declare cnt, n int;
    select count(*) into n from foo where parent_id is null;
    update foo a, foo b set a.path = b.name where b.parent_id is null and a.parent_id = b.id;
    select count(*) into cnt from foo where path is null;
    while cnt > n do
        update foo a, foo b set a.path = concat(b.path, '/', b.name) where b.path is not null and a.parent_id = b.id;
        select count(*) into cnt from foo where path is null;
    end while;
END//

To check the actual record we just printed the plain records having null value in path column

select * from foo

Results:

| ID |         NAME | PARENT_ID |   PATH |
------------------------------------------
|  1 |        root1 |    (null) | (null) |
|  2 |       child1 |         1 | (null) |
|  3 |    subchild1 |         2 | (null) |
|  4 |       child2 |         1 | (null) |
|  5 |       child3 |         1 | (null) |
|  6 |    subchild2 |         4 | (null) |
|  7 | subsubchild1 |         6 | (null) |

Calling the procedure:

call updatepath

Result after procedure execution:

select * from foo

Results:

| ID |         NAME | PARENT_ID |                   PATH |
----------------------------------------------------------
|  1 |        root1 |    (null) |                 (null) |
|  2 |       child1 |         1 |                  root1 |
|  3 |    subchild1 |         2 |           root1/child1 |
|  4 |       child2 |         1 |                  root1 |
|  5 |       child3 |         1 |                  root1 |
|  6 |    subchild2 |         4 |           root1/child2 |
|  7 | subsubchild1 |         6 | root1/child2/subchild2 |

SQLFIDDLE

Hope this helps....

Meherzad
  • 8,433
  • 1
  • 30
  • 40
  • +1 for such a good answer. I regret for commenting here, but i need your help in solving a similar **[Question](http://stackoverflow.com/questions/13493127/i-want-generate-xml-file-in-a-hierarchical-form)**. Thanks. – Prahalad Gaggar Apr 11 '13 at 06:29
  • 2
    Interesting solution. It might possibly be more efficient to use SELECT ROW_COUNT() within the loop to get the updated record count rather than getting the number of rows where the path is still null (the loop could then just be changed to check the number of updated records was greater than 0). Further this would prevent the possibility of an endless loop should there be a child record where the parent id does not exist. – Kickstart Apr 11 '13 at 08:36
  • thanks ..it worked like a charm to me...but one problem I am facing is that when I call this proc from my java application..it causes locks and procedure runs indefinetly causing me to kill the application.checking innodb engine status tells me that is waiting for update query inside the while loop...any idea why it is behaving from the application transaction which just calls this proc – Rips Jul 27 '15 at 18:25
  • Answering my own comment..my proc was going into an infinte loop while running from the java app.reason was that I used select rowcount() to find the updated count and used the same in the while loop.Now the where clause in update is "where b.path is not null and a.parent_id = b.id" eventhough I can see from workkbench that if columns to be updated are same it shown 0 rows changed,5 found but while running from java app it was finding always 5 and causing it to run in infinite loop.I needed to modify the update statement to add "a.path is null". more detail at http://goo.gl/6ZJg3g – Rips Jul 29 '15 at 06:25
1

I really like Modified Preorder Tree Traversal. It allows you to get an entire tree heirarchy in a single query. Here is a detailed tutorial: http://www.sitepoint.com/hierarchical-data-database-2/

If you have any questions about MPTT just let me know and I'd be glad to help!

chrislondon
  • 12,487
  • 5
  • 26
  • 65
  • When designing from scratch and the data is fairly static then this is probably the best solution. Bit of a problem to implement it against an existing database and (in my experience) causes a major overhead if there is any reasonable amount of inserts and deletes. – Kickstart Apr 09 '13 at 09:15
0

While not strictly possible in a single call, you can hide multiple calls but putting them into a MySQL function which you call from your SQL, which returns the parent path.

While this would likely be more efficient than doing it in a script I wouldn't expect it to be that efficient.

If the max number of levels is fixed you could possibly use JOINs as follows:-

SELECT foo.id, foo.name, CONCAT_WS(',', d.name, c.name, b.name, a.name)
FROM foo
LEFT OUTER JOIN foo a ON foo.parent_id = a.id
LEFT OUTER JOIN foo b ON a.parent_id = b.id
LEFT OUTER JOIN foo c ON b.parent_id = c.id
LEFT OUTER JOIN foo d ON c.parent_id = d.id

Although this will work it is quite restrictive (ie, if the max number of levels changes you would have to change every bit of SQL using this), plus if the number of levels is anything other than small it will become an unreadable mess.

Kickstart
  • 21,403
  • 2
  • 21
  • 33
0

You could consider adding a closure table that contains all the paths from each of the tree roots to the leaves. Maintaining the transitive closure of graphs in SQL (from 1999) describes some of the theoretical background.

The stackoverflow review question on Hierarchical data describes a number of the alternative approaches. In there, Tegiri Nenashi points to a comprehensive bibliography that includes Hierarchical data in RDBMSs.

Closure table have the advantage that the queries are efficient, and the solution does not affect your current data structure. You will need to extend your with the closure table and maintain it when the forest is modified.

You indicate a large number of items in the table with short paths. This makes the closure table performance remain nearly linear with appropriate indices. You can also retain the path to each node in the close table to avoid recalculation. The approach has a constant number of queries for each operation, and supports hierarchies of any depth.

Community
  • 1
  • 1
Pekka
  • 3,529
  • 27
  • 45
0

You should use nested sets model http://en.wikipedia.org/wiki/Nested_set_model

Michael Sivolobov
  • 12,388
  • 3
  • 43
  • 64