2

Given a tree (not an arbitrary graph) and a node in the tree then is it possible to find the root node of the tree in an efficient way in memgraph?

At the moment I've created an empty memgraph db in docker, and run:

CREATE INDEX ON :Node(id);
FOREACH (i IN range(0,100000) | CREATE (:Node {id: i}));
match (n1:Node),(n2:Node) where n1.id=n2.id-1 create (n1)-[:PARENT]->(n2);

to create a path. I'm interested in performance comparisons with postgres. For instance, using a WITH RECURSIVE query in postgres takes 3.4s for a path of length one million on my laptop.

Have I created the path correctly in memgraph? Starting from the node (:Node {id: 1}) what is the best way to follow the parent pointers and get to the root node?

I've tried 'cheating' by doing the following

CREATE INDEX ON :Node(id);
FOREACH (i IN range(0,10000) | CREATE (:Node {id: i, is_child: 'yes'}));
CREATE (:Node {id: 100001, is_child:'no'});
match (n1:Node),(n2:Node) where n1.id=n2.id-1 create (n1)-[:PARENT]->(n2);
match (n:Node)-[:PARENT*]->(root:Node) WHERE n.id=1 AND root.is_child = 'no' RETURN n, root;

but it runs in 7.4s for a path only of length 100,000.

mwpb
  • 83
  • 7

1 Answers1

3

So regarding the query, you have created the path and nodes correctly. But there is no need to set the property of the node to mark it as child since you know that root or topmost parent has an id=0 you can then find the path from any child node to the target root node.

Here is an example of how it would look like (BFS, Bread First Search):

MATCH (child:Node{id:100000})<-[e:PARENT* BFS]-(parent:Node{id:0})
RETURN child, parent;

I got the following times for (BFS) query above:

  • 576 ms - 1 Milion traversals
  • 87 ms - 100K traversals
  • 9 ms - 10K traversals
  • 1 ms - 1k traversals

On the first go, I tried running DFS, it is default and equal to [e:PARENT*]

    MATCH (child:Node{id:100000})<-[e:PARENT* DFS]-(parent:Node{id:0})
RETURN child, parent;

I would expect it to perform better or at least identically (worst-case) to BFS, but while testing your issue/observation I noticed a significant drop in performance on larger depths. Got similar results to yours. We need to profile what is happening on larger depths in the DFS approach.

EDIT:

If there is no target node for the root (root is identical to children), all paths from the target node (child) need to be considered for finding the longest path in the tree. This increases query complexity. In the Memgraph case index on the label should be created.

The query for this would look like this:

MATCH path = (child:Node{id:100})<-[:PARENT*]-(root:Node)
RETURN root
ORDER BY size(path) DESC
LIMIT 1
Ante Javor
  • 156
  • 1
  • 5
  • Hi Ante, thanks for taking a look at this one. Indeed the performance of the BFS is good as you wrote it. Our use case is slightly different though. We would like to find the root of the tree if we are given only a single node. I.e. there is no way to tell which one is the root apart from that it has no parents. In other words we cannot assume that the labels (in this case 'id' or 'is_child') correspond to positions in the tree. Can your solution be adapted to this case? Thanks! – mwpb Mar 01 '23 at 08:28
  • If roots are not marked with a Label or the ID is unknown this changes things quite a lot. This means you need to find the longest path between the child (target node) and the root node in the relationship direction. – Ante Javor Mar 03 '23 at 15:47
  • During testing, noticed that our DFS is struggling a lot with traversals larger than 10K and some huge memory consumption is popping, which suggests there is a bug inside. I have opened a Github issue, feel free to add extra details: https://github.com/memgraph/memgraph/issues/817 – Ante Javor Mar 03 '23 at 16:07
  • Thanks Ante, this isn't urgent so I've subscribed to that GitHub issue. Will revisit in due course. Thanks! – mwpb Mar 05 '23 at 16:12