8

I need help with the model of my neo4j graph structure for a time dependent domain. See the following sketch for the requirement and problem:

Problem sktech

  • Picture 1 & 2: For each day I have nodes and relationships between them. I define the relationship as co-occurrence between two nodes (e.g words) in some lexical unit (sentences). The same node can occur on several days with new nodes or already existing once. See the following example, where we only consider named entities for nodes:

    • 2013/01/01: Peter was wondering about Cassandra tonight.
    • 2013/01/01: Cassandra wants to stay at home with Peter.
    • ....
    • 2013/01/08: Peter was fallen in love with Judith.
    • 2013/01/08: Cassandra drives Peter to school every day.

    This will result in the graph structure below.

     - 2013/01/01:
    
        (Peter) <--2--> (Cassandra)
    
     - 2013/01/08
    
        (Peter) <--1--> (Judith)
    
        (Peter) <--1--> (Cassandra)
    
  • Picture 3: The graph structure should support to select a certain time span and get a path from a starting point (P1) to an end point (P2). Here the path is given by the max flow between those two nodes with respect to the accumulated nodes and relations for the specific time span.

  • Picture 4: It should also be possible to expand nodes according to e.g the highest remaining edge weight. Picture 4 shows the expanded graph with 3 additional nodes.

I already know this work 2 and the multi-level index 3 example. The first model do not support good path finding between nodes from different frames. Only the latter one will be helpful for querying time ranges. Hope somebody can help.

Regards.

user2715478
  • 1,273
  • 12
  • 17
  • Something like FluxGraph for Neo4j would be nice. See: http://www.slideshare.net/datablend/fluxgraph-a-timemachine-for-your-graphs. If someone has a good idea for storing such snapshots let me know. How about labeling each node with a timestamp? Does this scale well? Maybe I can create such a plugin for neo4j. Regards. – user2715478 Jun 06 '14 at 22:55

1 Answers1

4

There are numerous ways to model time in a graph. One way is adding a timestamp, or even start/end time of the period in which the relation was valid. That way, you can query the graph to return subgraphs, or paths, which were valid at a given time.

Ian Robinson (one of the authors of the Graph Databases book) has written a very good blog post about this topic: http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/

Regarding performance, it is true that accessing relationships is a bit more expensive than querying only by relationship type, but you probably will need to benchmark for yourself, with your own data set, so I would suggest to start with the simplest model which works for you, and then optimize performance iteratively, if necessary.

Axel Morgner
  • 2,292
  • 16
  • 15
  • Thanks. The blog post looks great. But where I should add properties of relationships that change over time? The simplest approach I could think of would be to add for each change a new relationship between identity nodes (including the timespan from/to and the new value). – user2715478 Jun 09 '14 at 12:56
  • Yes. If a property of a relation (relation: class, relationship: instance of a possible relation) changes, you need to create a new relationship with the changed property value and a different timestamp/validity period. – Axel Morgner Jun 09 '14 at 21:55
  • Thank you! I will test this approach with my data set. As far as I know from the gremlin documentation, it should be feasible to sum the edge weight for a given timespan und traverse the max flow path. I havn't seen something like this for cypher. Found only the documentation of the java traversal framework api. – user2715478 Jun 10 '14 at 16:38
  • To get the sum of weights of a path, you can use Cypher's SUM() function: http://docs.neo4j.org/chunked/stable/query-aggregation.html#aggregation-sum. See also http://stackoverflow.com/questions/19689095/cypher-sum-property-on-incoming-outgoing-relationships-neo4j which might help. – Axel Morgner Jun 11 '14 at 00:53
  • In the mean time I implemented the algorithm using the neo4j core API. For the widest path I used the dijkstra algorithm for weighted paths and a custom costEvaluator. I applied a trick (Int.MaxValue - cost) to retrieve the widest path. Seems to work. Unfortunately I cannot limit the path length with this given algorithm. Anyway I have a question regarding the changing of nodes over time. If only one property of these nodes changes will it be better to model the change as a relationship to the node itself with a timestamp and new value or use a relationship with a new node and the new value? – user2715478 Jun 19 '14 at 21:24
  • 4
    This url http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/ is no longer valid. Does anyone know where can I find the article ? – Alexander Petrov Feb 27 '19 at 11:53
  • 2
    Unfortunately, the whole site seems to be gone, so no luck. However, there are other articles about this topic, see https://stackoverflow.com/questions/45669949/neo4j-how-to-model-a-time-versioned-graph, https://softwareengineering.stackexchange.com/questions/323444/graph-database-keeping-historical-relationships and even a scientific papers like https://www.uni-marburg.de/fb12/arbeitsgruppen/swt/forschung/publikationen/2012/TELW12.pdf and https://researcher.watson.ibm.com/researcher/files/us-leejinho/Multiversion_support_for_property_graph_databases___BigData_2017.pdf. – Axel Morgner Feb 28 '19 at 12:43
  • 1
    Someone copied the text from the article and backed it up here: https://gist.github.com/imranansari/1fb97b6137b88b5d2a36e00b901eff15#file-time-versioning-in-graphs-md – Benjamin R Jul 14 '19 at 16:29