4

Is it possible to sort by a materialized path tree's path text field in order to find the right-most node of the tree? For example, consider this python function that uses django-treebeard's MP_Node:

def get_rightmost_node():
    """Returns the rightmost node in the current tree.

    :rtype: MyNode
    """
    # MyNode is a subclass of django-treebeard's MP_Node.
    return MyNode.objects.order_by('-path').first()

From all my testing, it seems to return what I expect, but I don't know how to come up with the math to prove it. And I haven't found any info on performing this operation on a materialized path tree.

Treebeard's implementation doesn't have separators in paths, so the paths look like this: 0001, 00010001, 000100010012, etc.

nnyby
  • 4,748
  • 10
  • 49
  • 105
  • 2
    If the path is stored in a varchar column, I would be concerned that a path like `4\3\1\11` would end up coming before a path like `4\3\1\5`, which should technically be first – Paul Griffin Apr 27 '15 at 13:45

4 Answers4

4

Short answer: No.

Here is a SQLFiddle demonstrating the problem I described in my comment.

For this simple setup:

id, path
1,  '1'
2,  '1\2'
3,  '1\3'
4,  '1\4'
5,  '1\5'
6,  '1\6'
7,  '1\7'
8,  '1\8'
9,  '1\9'
10, '1\10'

attempting to get the rightmost leaf (id = 10) with a simple sort will fail:

SELECT TOP 1
  id,
  path
FROM hierarchy
ORDER BY path DESC

returns:

id, path
9,  1\9

Because path is a text-based column, 1\10 will come after 1\9 in the descending sort (See the results of the second query in the fiddle).

Even if you began tracking depth and path length, which are generally cheap and easy to keep up with, it would be entirely possible to get paths like this:

path       depth  length
12\3\11\2  4      9
5\17\10\1  4      9

which would still not sort properly.

Even if you are using letters instead of numbers, this only pushes the problem horizon out to the 26th child instead of the 10th:

SQLFiddle using letters

I am not as familiar with materialized path operations as I am with nested set and adjacency lists, and have no experience with django, so I'll defer to others if there are methods of which I am unaware, but you will almost certainly have to perform some sort of parsing on the path column to consistently get the correct leaf.

EDIT - Having addressed the question of whether sorting is a valid solution, here are some additional notes on other potential solutions after a bit of discussion and thinking on the problem a bit:

-"Rightmost" is a vague term when nodes can have more than two children (i.e., the tree is not a binary tree). If a node has 10 children, which are to the left of the parent, and which are to the right? You must define this condition before you can define a solution to the problem.

-Once "rightmost" is properly defined for your problem space, understand that the rightmost node will not necessarily be on the lowest level of the tree:

        1
       / \
    1\1   1\2 <= This is the rightmost node
    /
  1\1\1 <= This is the lowest node

-Once "rightmost" is defined, a simple loop can be used to programatically find the rightmost node:

//in pseudocode
function GetRightmostNode(Node startNode)
{
  Node currentNode = startNode;

  while(currentNode.RightChildren != null)
  {
    currentNode = maximum of currentNode.RightChildren;
  }

  return currentNode;
}

This loop will look for children of the current node to the right of the current node. If they exist, it selects the rightmost of the right children and repeats. Once it reaches a node with no children to its right, it returns the current node, as it has found the rightmost node of the tree (or subtree) with startNode as its root.

Paul Griffin
  • 2,416
  • 15
  • 19
3

Is it possible to sort by a materialized path tree's path text field in order to find the right-most node of the tree?

No. If node paths are stored like '/1/3/6/2' for instance, consider:

/1
/1/3
/1/3/6/2
/1/3/6/5
/1/3/6/21
/1/40

Refer to Paul's answer for the reasons sorting the above wouldn't work.

All hope is not lost though. If you're searching for "the right-most node", by which I assume you mean the deepest nodes in a tree, you could simply count the separators. For instance:

select length(regexp_replace('/1/3/6/2', '[^/]+', '', 'g')) as depth;

If you're looking for the max of that, use something like:

order by length(regexp_replace(path, '[^/]+', '', 'g')) desc

... or the equivalent python code. Index options include indexing the same expression, or storing the result in a separate depth field and indexing that.

If you're still interested in the actual value of the ID, the numbers in the above usually correspond to the ID, so order further using that column. If they're different, extract the right-most figure using a different regular expression, and cast it to integer so as to sort them naturally (1, 11, 2) instead of lexicographically (1, 11, 2):

select regexp_replace('/1/3/6/2', '^.+/', '')::int as value;
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • This will get you the deepest level, but will not reliably get you the rightmost leaf on that level. You will still have to parse `/1/3/6/5` and `/1/3/6/21` to discover which number is greater. A simple sort will put `21` before `5` – Paul Griffin May 01 '15 at 16:01
  • @PaulGriffin: As I understand the question, OP would want both in any order, or something else if there's anything with depth 5 or more. If he wants to sort them, 5 and 21 would correspond to the IDs and he can sort the result set using that column instead. (If it's not, `regexp_replace(path, '^.+/', '')::int` and call it a day.) – Denis de Bernardy May 01 '15 at 17:43
  • For that matter, the rightmost node might not be on the deepest level. Consider: `/1, /1/1, /1/1/1, /1/2`. Despite `/1/1/1` being the deepest node, `/1/2` is the farthest right. – Paul Griffin May 01 '15 at 18:25
  • @PaulGriffin: again, the question as I understand it is not "sort rightmost node" but indeed "can I use sort to find the rightmost node". The answer is no, and OP needs a means to actually find the rightmost node. But I may be understanding the question wrong, of course. – Denis de Bernardy May 01 '15 at 20:48
  • Sorry if I have sounded disagreeable, this question has been very interesting to me and I have enjoyed the discussion here. I understand that the answer to the OP's question is "no, you can't sort to get the rightmost node", and that you are trying to provide a potential means actually get the rightmost node. What I am trying to point out that the rightmost node is not necessarily on the deepest level of the tree, A method that always returns a node on the deepest level will not always return a valid result. – Paul Griffin May 02 '15 at 14:21
  • @PaulGriffin - Hehe, no worries. To me rightmost is just a bizarre way of saying deepest, and that might indeed be what OP meant if he's a non-native English speaker. But I may very well be incorrect in that assumption (in which case I've no idea what it's about). – Denis de Bernardy May 02 '15 at 14:31
0

EDIT: Paul Griffin rightly pointed out that my answer was unreliable, because it assumed that the nodes would be under a certain value. Here is a better attempt, incorporating two spins on Denis de Bernardy's depth function.

Use two sort criteria, one for the depth, and then one more for the value of the leftmost node converted to an integer:

SELECT path, 
       length(regexp_replace(path, '[^/]+', '', 'g')) as depth,
       regexp_replace(path, '^.*/', '')::int as last       
FROM test 
ORDER BY depth DESC, last DESC;

This will put the deepest node with the highest value at the top.

SQLFiddle

McCroskey
  • 1,091
  • 1
  • 11
  • 21
-1

You can use the method @Paul explained with a bit of modifications.You can append 0 in front of every digit and can have a uniformity in length of each path.

The nodes can be assigned path as,

id |  path
-----------------
1  |  '01'
2  |  '01\01'
3  |  '01\02'
4  |  '01\03'
5  |  '01\04'
6  |  '01\04\01'
7  |  '01\04\02'
8  |  '01\04\03'
9  |  '01\05\01'
10 |  '01\05\02'
11 |  '01\05\03'
12 |  '01\05\04'

The above example can be used if the number of child nodes of the node having maximum number of child nodes is less than 100.

If it's between 100 and 1000, then you can add an extra 0 as 001\003\002\005 like wise.

Then you can get the right most node 12 as,

SELECT TOP 1 id
FROM tree
ORDER BY path DESC

You can find the demo here. Demo

  • This is not a knock on your answer, more like a warning for posterity... If you are *certain* your data will never exceed set bounds, these sorts of solutions are workable. BUT, you would do well to be wary of solutions that are this dependent on data. I have been bitten too often in the past by changing specs or data sets that grew several orders of magnitute beyond their expected size and started breaking solutions like this. Repair/refactoring/maintenance at that point is a nigh impossible, and eats more time and money than if you had just written a more general solution to begin with. – Paul Griffin Apr 30 '15 at 14:57
  • I agree with you @PaulGriffin. If the level exceeds beyond the limit, refactoring will be a real herculean task. – Nikhil K Mamotty Apr 30 '15 at 15:48