6

I would like to ask if someone knows a performant way to store the path from the root node to a new node of a multiway tree during the insertion of the new node. E.g., if I have the following tree:

Multiway tree

For each node, I currently store an array of the path to the node from the root node during insertion in the following way by assigning a unique int ID to each children on the same depth:

Root node -> [1]

Depth 1, child 1 of root -> [1, 1]
Depth 1, child 2 of root -> [1, 2]

Depth 2, child 1 of parent 1 -> [1, 1, 1]
Depth 2, child 2 of parent 1 -> [1, 1, 2]
Depth 2, child 3 of parent 1 -> [1, 1, 3]
Depth 2, child 1 of parent 2 -> [1, 2, 4]
Depth 2, child 2 of parent 2 -> [1, 2, 5]

Depth 3, child 1 of parent 3 -> [1, 1, 3, 1]

...

If I now insert a new node from the leaf node 1 on depth 3, I would have to create a new path array for it storing all the nodes of the parent 1 (i.e. [1, 1, 3, 1]) plus the new child ID, which is 1 for the first child:

Depth 4, child 1 of parent 1 -> [1, 1, 3, 1, 1]

As my tree grows a lot in height (the number of children per depth is relatively low, but the depth can be high), the slow part of this algorithm would be this array recreation process. Just imagine a tree of depth 1.000.000, if I insert a new node from a node of depth 1.000.000, I would have to create a new array for this new node storing all the 1.000.001 IDs of the parent plus appending the new node's ID:

Depth 1.000.001, child 1 of parent x -> [...1 million and one IDs... , 1]

Is there a more efficient way to store the path on each node during node's insertion?

I basically need this to determine if any given node is a child of a possible parent node in the tree and as I have the path stored in each node, I can easily do that by checking the path array of the child, like this:

// Ex. 1
Is node 4 of depth 2 the parent/ancestor of node 1 of depth 3?

node 1 of depth 3 has the following path array: pathArray = [1, 1, 3, 1]
Its ancestor ID on depth 2 is: pathArray[2] -> 3

3 != 4 and therefore I know that node 4 of depth 2
is not a parent of node 1 of depth 3.

// Ex. 2
Is node 1 of depth 1 the parent/ancestor of node 1 of depth 3?

node 1 of depth 3 has the following path array: pathArray = [1, 1, 3, 1]
Its ancestor ID on depth 1 is: pathArray[1] -> 1

1 == 1 and therefore I know that node 1 of depth 1
is a parent of node 1 of depth 3.

This lookup operation would be fast, the problem is the creation of the path array as the tree goes deeper.

Any suggestions would be appreciated.

Thank you for the attention.

tonix
  • 6,671
  • 13
  • 75
  • 136
  • How often do you insert and query? What about delete? – Yola Aug 25 '19 at 10:41
  • Lookups may be more frequent than inserts, but a lot of inserts may happen for a single branch, leading to a deep tree. Delete and replace operations are also possible, but that's another story and I handle them differently. – tonix Aug 25 '19 at 10:48
  • Do I understand you correctly if your query is more or less of the form: `given a node x, is the k'th node on the path from root to x the node y`? – SecularisticSloth Aug 25 '19 at 11:54
  • Before we get into more complicated answers... the default way is just to have each child store a pointer to its parent, and sometimes its depth. Ancestor queries are then easy to answer just by walking this list from the child up to the parent's depth. What about this approach doesn't work for you? – Matt Timmermans Aug 25 '19 at 12:15
  • @SecularisticSloth Not exactly. I need to know if `given two nodes x and y, x is an ancestor of y`. In doing so, I need to store the path from `root` to `x` and assign `y` a unique ID for its depth/level in the tree. – tonix Aug 25 '19 at 15:52
  • @tonix So in your second example: "Is node 1 of depth 1 the parent/ancestor of node 1 of depth 3?" you only specify the depth of the nodes to uniquely identify them since the same identities appear on multiple levels in the tree? – SecularisticSloth Aug 25 '19 at 17:01
  • @SecularisticSloth Every time I check if node `x` is an ancestor of `y` I already have a reference to `x` and `y` nodes, and I also know the depth of `y` and the depth of `x`. What I don't know is if `x` is effectively an ancestor (it could even be a child of `y`, or maybe it is a node in another branch like in the first example). If with `the same identities appear on multiple levels in the tree` you mean `the same values`, then yes, it could happen, but that's not a problem for me, I'm not interested in the value of the node, I just need to know if the ancestor-child relationship holds. – tonix Aug 25 '19 at 17:58
  • @tonix I've added an answer which I believe addresses your problem. It might be a little unclear, don't hesitate to ask. – SecularisticSloth Aug 25 '19 at 18:30

3 Answers3

2

Right now, your solution has O(1) lookup time, O(h) insert time and O(n^2) space conpelxity, where n is the number of nodes and' h is the height, which is at most n.

You can achieve a tradeoff with O(log n) lookup, O((log n)^2) insert and O(n log n) space in the following way:

Let every node store one jump pointer to each of its ancestors with distance 1 (its parent), 2 (grandparent), 4 (grandparent's grandparent), 8, 16 and so on, until the root is reached or passed. The maximum distance from any node to the root is n, so for every node you store O(log n) jump-pointers. Since you do this for every node, the total space complexity is O(n log n).

Answering the query of whether y is an ancestor of x is trivial if y doesn't have a lower depth than x. Name the depths of the nodes dy and dx. You know that if y is an ancestor of x, then it is the dx-dy'th ancestor of x. That is, if dy = 5 and dx = 17, you know that if y is x's ancestor, then it is 17 - 5 levels above x.

Therefore, you can perform lookups by recursively jumping the largest possible distance upwards in the tree from x without overshooting the target ancestor. For instance, if you're starting at depth 16 and want to find the ancestor at depth 6, you're interested in the ancestor 10 levels above. You cannot jump 16 levels up, as this would overshoot the target ancestor, so you jump 8 instead. Now you're at depth 16-8=8, and the remaining distance to the target ancestor, which is 6, is 2. Since there is a pointer which goes exactly two steps up, you follow that and you've arrived at the target ancestor.

Every time you follow a pointer upwards in the tree, you're getting at least half way to your target, so the maximum number of pointers you can follow is O(log n).

When inserting a node e as a child of another node x you can construct e's jump-pointers by finding x's ancestors with distance 1, 3, 7, 15, etc. (since e is one level further away from all of these than x is). There are O(log n)such searches. As we argued above, each of the lookups take O(log n) time. Thus the total is O((log n)^2).

This operation might even be made even faster by storing some additional information, but I can't see exactly how just now.

NOTE This idea is actually a part of the classical solution for the Level Ancestor Problem. The classical solution allows for lookups as you have described them in O(1) time, while keeping the space of the entire data structure to O(n). However, the data structure is static, so the solution does not specify how to do insertions. There might be a way to adapt the level ancestor to a dynamic scenario and get even better running times than I've described here, but I'm not sure how.

  • Thank you for your answer! I really like this logarithmic idea. Will try to implement it and let you know. I also came up with an idea based on the heuristic assumption that my tree grows more vertically than horizontally, i.e. there may be just a single high, long, deep branch with just a small amount of children per level/depth, if any. I will update my question and would be glad to receive your opinion! – tonix Aug 25 '19 at 19:34
1

Arrays have all their values stored contiguously in the memory. If you want to retain this property you must use them. Or, if you are OK with hoping through several memory locations, you can store in every node only its immediate parent and trace up to the required level to do the check required.

Yola
  • 18,496
  • 11
  • 65
  • 106
  • `store in every node only its immediate parent and trace up` I think this way lookups will be slow. For a leaf deep down in the tree (e.g. with depth `100.000`), I would have to jump up until its node stored on depth `2` if I am testing whether a node on depth `2` is an ancestor or not. Maybe there's a fancy way to use a single array pointed by each node in the tree? The only thing that comes to my mind is: If I'm forking a 1st child from another node, why should I create a new array? I can append to the one pointed by the parent... But then what happens if I fork child 2 from the same parent? – tonix Aug 25 '19 at 11:03
  • @tonix If you will ever find such an array, don't tell anybody, just call me. We will do business together;) About your second thought, I believe, that some insertions will spoil any trick you could think of, to keep lookup time small, for such insertions you would need to recreate large part maybe even whole structure. – Yola Aug 25 '19 at 16:50
  • @tonix I advise you to give folks more information about the problem at hands in another question. Maybe different data structure would do better. – Yola Aug 25 '19 at 16:52
  • `We will do business together` :D That sounds interesting ahah, right now I am looking to just expand my knowledge, but if I ever find something, I promise I will write a comment here in this thread on SO. – tonix Aug 25 '19 at 18:01
  • And yes, I will rethink about my problem and eventually post another question, thank you for your help! – tonix Aug 25 '19 at 18:14
1

Map your nodes in a HashMap<node-id, node>.


Now, when you have to

determine if any given node is a child of a possible parent node,

You can find the exact location of that node in the tree from the HashMap and then travel back up the tree using the parent pointers to see if the possible parent lies on the path to root.

In a fairly balanced tree, this will be O(Log n) run-time (to traverse up the tree) and O(n) space (of HashMap).


If you go with your current design of storing the path from each node to root, then you would have O(Log n) run-time (assuming a balanced tree) and O(n * Log n) space to store the Log n length path for each of the n nodes.

displayName
  • 13,888
  • 8
  • 60
  • 75
  • `You can find the exact location of that node in the tree from the HashMap` you mean that I start the jumps from the possible child `x` of possibile ancestor `y` and jump up until I reach `y` using the depth information of `y` ? What is the difference from SecularisticSloth's proposed approach? – tonix Aug 25 '19 at 19:53
  • @tonix: I think my approach is simple and straightforward. SS’s approach is far more convoluted. (Not implying that it is bad. It may be just what you’re looking for.) Secondly, from a cursory read of that answer’s first para, my solution’s space and run time complexities are lower. – displayName Aug 25 '19 at 20:39
  • 1
    @displaName The first paragraph is the running time of the solution initially proposed by the OP. The second paragraph are the running times of the solution I suggest. I realize that might be a little misleading. However, your assumption that the tree is balanced is not valid I think. As the OP states in the question, the tree grows in height, but not much in width. There are only a few nodes on each level. – SecularisticSloth Aug 25 '19 at 21:26
  • So the running time of the lookup operation would be `O(n)`, (or `O(h)`). Insert would run in `O(1)`, though, and space would be `O(n)`. Unless the tree is unreasonably tall or a large amount of lookup operations will be done, this is probably the way to go. – SecularisticSloth Aug 25 '19 at 21:47
  • @displayName SecularisticSloth is right, the tree is unbalanced and grows a lot in height, as per my question `As my tree grows a lot in height (the number of children per depth is relatively low, but the depth can be high), ...` If I understood correctly, your solution would imply to traverse back all the nodes from the leaf to root in order to find an ancestor in-between. This, however, would be `O(n)` (worst case for a tree consisting of a single branch only without multiway branches) or `O(h)`, where h is the height of the tree. I will update my question asap when I have time! Thank u all! – tonix Aug 26 '19 at 06:31