4

I have a table in my database where I store a tree structure using the hybrid Nested Set (MPTT) model (the one which has lft and rght values) and the Adjacency List model (storing parent_id on each node).

my_table (id, parent_id, lft, rght, alias)

This question doesn't relate to any of the MPTT aspects of the tree but I thought I'd leave it in in case anyone had a good idea about how to leverage that.

I want to convert a path of aliases to a specific node. For example: "users.admins.nickf" would find the node with alias "nickf" which is a child of one with alias "admins" which is a child of "users" which is at the root. There is a unique index on (parent_id, alias).

I started out by writing the function so it would split the path to its parts, then query the database one by one:

SELECT `id` FROM `my_table` WHERE `parent_id` IS NULL AND `alias` = 'users';-- 1
SELECT `id` FROM `my_table` WHERE `parent_id` = 1 AND `alias` = 'admins';   -- 8
SELECT `id` FROM `my_table` WHERE `parent_id` = 8 AND `alias` = 'nickf';    -- 37

But then I realised I could do it with a single query, using a variable amount of nesting:

SELECT `id` FROM `my_table` WHERE `parent_id` = (
    SELECT `id` FROM `my_table` WHERE `parent_id` = (
        SELECT `id` FROM `my_table`
        WHERE `parent_id` IS NULL AND `alias` = 'users'
    ) AND `alias`  = 'admins'
) AND `alias` = 'nickf';

Since the number of sub-queries is dependent on the number of steps in the path, am I going to run into issues with having too many subqueries? (If there even is such a thing)

Are there any better/smarter ways to perform this query?

nickf
  • 537,072
  • 198
  • 649
  • 721
  • Why don't you build a test table and find out how deep the query (recursion) stack is? – lexu May 28 '10 at 04:11
  • It 'feels wrong' to not be able to use the most discriminating selector ('nickf') first .. maybe if you build it by joining the queries, instead of using subqueries? That would avoid the recursion (not realy, but looks like it) => cascading self join! – lexu May 28 '10 at 04:15
  • I wrote that 'self-join cascade' out of curiosity .. see my answer. – lexu May 28 '10 at 04:28

2 Answers2

3

Does this work?

select r0.id 
  from my_table as r0 
  join my_table as r1 on(r0.parent_id = r1.id) 
  join my_table as r2 on(r1.parent_id = r2.id)
 where r0.alias='nickf'
   and r1.alias='admins'
   and r2.alias='users'
   and r2.parent_id is null

Seems to me there is not really a need for nested subqueries ..

or am I wrong, missing something?

lexu
  • 8,766
  • 5
  • 45
  • 63
2

I wondered about this myself, and was looking for something that didn't get slower as you went deeper (meaning more levels in both options above.) The problem I have with "my version", is that it has to create every possible path before it narrows the result down to the one you're actually searching for... so I think lexu's version should outperform mine even for very large nesting because it's a simple join, but I'm hoping someone might see it and wish to expand on it further.

Also, this way of doing it would definitely benefit from a stored proc, and/or a view of the 'paths' part of it (without the HAVING clause). Perhaps with those it is a better solution, but I unfortunately don't know enough at this time about SQL performance to say for sure. I can say that mine gets slower as the data (number of possible combination of paths) gets larger, but with a view (since the result is cached, and using that to narrow it down) it seems fast (the largest data set I found was 370 total, at some point I'll create a much larger set to test.)

SELECT node.id, GROUP_CONCAT(parent.alias
                 ORDER BY parent.lft SEPARATOR '.') AS path_name
FROM my_table AS node, my_table AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rght
GROUP BY node.id HAVING path_name = 'users.admins.nickf'
zbateson
  • 1,044
  • 10
  • 11
  • Anyone that happens across this, I did test it with a large dataset in MySQL. It is very slow, and the use of views didn't help unfortunately. – zbateson Nov 02 '13 at 00:40