0

I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:

match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time

The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.

Here is the CSV sample data:

from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7

Here is what the graph looks like:

Graph

I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:

Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]

This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges

Community
  • 1
  • 1
Peter
  • 9,643
  • 6
  • 61
  • 108

1 Answers1

1

This one is not 100% what you want, but very close:

MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) | 
   (r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name

We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.

The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.

The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.

The current issue is that we have duplicates since e.g. there are multiple paths from B to A.

As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).

Stefan Armbruster
  • 39,465
  • 6
  • 87
  • 97
  • Oh, this is good! I agree, it's not 100% perfect, but I think it will do the job. Once it comes back I'll parse the result, create pairs of name and movement_time, and then get the distinct set of them. That will get rid of my duplicates and should be what I'm after. I'll play around with it and let you know. – Peter Oct 13 '15 at 21:58
  • In testing this does work, and pruning duplicates was not an issue; but I agree that using Java Traversal will be much better, and performant for large graphs. I think I'll try and get a Java implementation working. – Peter Oct 16 '15 at 02:01