11

I have an object and I need to keep a history of all changes made to it. How would I implement this using neo4j?

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820

2 Answers2

11

As with a RDBMS, it would depend on your domain and data query requirements.

Does your application require regular access to all versions of the object or usually just to the most recent, with the older versions available via the current one? An example of this could be pages on Wikipedia. As as example, let's say we have a page which is on version 3. We could then model this as follows:

(pages)-[:PAGE]->(V3)-[:PREV]->(V2)-[:PREV]->(V1)
   ^               ^
   |               |
category        current
  node      version of page

Here, only the current version can be seen to form part of the main structure but you may wish to allow all versions to form part of that structure. In this case, you could use relationship properties to indicate the version and have all page versions link from the category node:

  (V1)
    ^
    |
[:PAGE(v=1)]
    |
 (pages)-[:PAGE(v=2)]->(V2)
    |
[:PAGE(v=3)]
    |
    v
  (V3)

Here, you can immediately traverse to a particular version of the page by simply specifying the version in which you are interested.

A third option could be that you wish all older versions to be completely separate from the main structure. For this you could use multiple category nodes, one for (current_pages) and another for (old_pages). As each page is superseded by a new version, it becomes unlinked from the former category and instead linked to the latter. This would form more of an "archive" type of system where the older versions could even be moved into a separate database instance.

So you have these three options, plus more that I haven't thought of! Neo4j allows you great flexibility with this sort of design and there's absolutely no "right" answer. If none of these inspire you however, post a little more information about your domain so that the answer can be more tailored for your needs.

Cheers, Nige

Nigel Small
  • 4,475
  • 1
  • 17
  • 15
  • Thanks, that got me on the right track. I think I'll mix the two approaches because for some types of access, I always need the current version (i.e. I'll create a `[:CURRENT]` type of relation to speed that up) but for others, I need to query a specific version, so I'll add a version property to the relation. – Aaron Digulla Oct 04 '12 at 09:46
  • @AaronDigulla I want to build a similar scenario. I have two questions: 1) let's assume that I want to store some versioned node of a `Car` and that I have an index placed on the `carId` (UUID) field. Is it not a problem to end up with 3 versioned `Car`s, having the exactly same indexed field (potential conflict with a query like "Retrieve the Car 32"?)? 2) With your `[:CURRENT]` solution, I imagine, at each new version, you have to break the previous `[:CURRENT]` relationship to point to the new `Car`version, and this one pointing with another `[:PREVIOUS] `relationship on the previous `Car`? – Mik378 Feb 27 '14 at 11:36
  • 1
    For your first question, I'd consider making the UUID represent an immutable item, i.e. a combination of car + version. Each revision would then create a new UUID and these could then be chained together to represent the version history. For your second question: yes - the relationships would need to be broken and rebuilt for each new version that appears (similarly to inserting an item into a linked list). – Nigel Small Feb 27 '14 at 12:15
  • 1
    @Mik378: I suggest you ask a new question for this. – Aaron Digulla Feb 27 '14 at 13:05
  • @NigelSmall Actually, my UUIDs are randomly generated (through Apache UUID). Therefore, I plan to generate a brand new UUID for each version, as if each version of Car was a brand new Car. Could it do the trick? Of course, linked through Relationships, expressing the versions. As Aaron suggested, a new question would be useful. – Mik378 Feb 27 '14 at 13:29
  • @AaronDigulla Here's my newly created post: http://stackoverflow.com/questions/22073512/neo4j-strategy-for-keeping-history-of-node-changes – Mik378 Feb 27 '14 at 15:50
1

You could also approach it from the other side:

(pages)-[:VERSION]->(V1)-[:VERSION]->(V2)-[:VERSION]->(V3)
   ^                                                   ^
   |                                                   |
category                                            current
  node                                          version of page

advantage : when you create a new version, you just add it at the end of the chain, no need to "insert" it between the (page) and the current version.

disadvantage :you can't just throw away old versions, unless you reconstruct the chain. But this is probably not a frequent operation.

Graphileon
  • 5,275
  • 3
  • 17
  • 31