16

I need to do versioning on (simple) Java object graphs stored in a document-oriented database (MongoDB). For relational databases and Hibernate, I discovered Envers and am very amazed about the possibilities. Is there something similar that can be used with Spring Data Documents?

I found this post outlining the thoughts I had (and more...) about storing the object versions, and my current implementation works similar in that it stores copies of the objects in a separate history collection with a timestamp, but I would like to improve this to save storage space. Therefore, I think I need to implement both a "diff" operation on object trees and a "merge" operation for reconstructing old objects. Are there any libraries out there helping with this?

Edit: Any experiences with MongoDB and versioning highly appreciated! I see most probably there won't be a Spring Data solution.

Community
  • 1
  • 1
Matthias Wuttke
  • 1,982
  • 2
  • 21
  • 38
  • Not full versioning, but we've implemented a tiny auditing system - logging who changed which old values to new ones. We're using Morphia's ``prePersist()`` method (which will only work for full entity saves, not specific updates). Can provide some code samples, but it's nothing sophisticated... – xeraa Aug 25 '12 at 20:08
  • Thanks for your comment! I would be very interested in some more details demonstrating your solution. Only tracking full entity saves is definitively ok: This is our main use case, too. A very interesting point is the way you compare the old to the new entity, identifying changed properties. I took a look into graph comparison frameworks here, but did not find a quick and easy solution. – Matthias Wuttke Aug 27 '12 at 16:41

3 Answers3

15

This is how I ended up implementing versioning for MongoDB entities. Thanks to the StackOverflow community for helping!

  • A change log is kept for each entity in a separate history collection.
  • To avoid saving a lot of data, the history collection does not store complete instances, but only the first version and differences between versions. (You could even omit the first version and reconstruct the versions "backwards" from the current version in the main collection of the entity.)
  • Java Object Diff is used to generate object diffs.
  • In order to be able to work with collections correctly, one needs to implement the equals method of the entities so that it tests for the database primary key and not the sub properties. (Otherwise, JavaObjectDiff will not recognize property changes in collection elements.)

Here are the entities I use for versioning (getters/setters etc. removed):

// This entity is stored once (1:1) per entity that is to be versioned
// in an own collection
public class MongoDiffHistoryEntry {
    /* history id */
    private String id;

    /* reference to original entity */
    private String objectId;

    /* copy of original entity (first version) */
    private Object originalObject;

    /* differences collection */
    private List<MongoDiffHistoryChange> differences;

    /* delete flag */
    private boolean deleted;
}

// changeset for a single version
public class MongoDiffHistoryChange {
    private Date historyDate;
    private List<MongoDiffHistoryChangeItem> items;
}

// a single property change
public class MongoDiffHistoryChangeItem {
    /* path to changed property (PropertyPath) */
    private String path;

    /* change state (NEW, CHANGED, REMOVED etc.) */
    private Node.State state;

    /* original value (empty for NEW) */
    private Object base;

    /* new value (empty for REMOVED) */
    private Object modified;
}

Here is the saveChangeHistory operation:

private void saveChangeHistory(Object working, Object base) {
    assert working != null && base != null;
    assert working.getClass().equals(base.getClass());

    String baseId = ObjectUtil.getPrimaryKeyValue(base).toString();
    String workingId = ObjectUtil.getPrimaryKeyValue(working).toString();
    assert baseId != null && workingId != null && baseId.equals(workingId);

    MongoDiffHistoryEntry entry = getObjectHistory(base.getClass(), baseId);
    if (entry == null) {
        //throw new RuntimeException("history not found: " + base.getClass().getName() + "#" + baseId);
        logger.warn("history lost - create new base history record: {}#{}", base.getClass().getName(), baseId);
        saveNewHistory(base);
        saveHistory(working, base);
        return;
    }

    final MongoDiffHistoryChange change = new MongoDiffHistoryChange();
    change.setHistoryDate(new Date());
    change.setItems(new ArrayList<MongoDiffHistoryChangeItem>());

    ObjectDiffer differ = ObjectDifferFactory.getInstance();
    Node root = differ.compare(working, base);
    root.visit(new MongoDiffHistoryChangeVisitor(change, working, base));

    if (entry.getDifferences() == null)
        entry.setDifferences(new ArrayList<MongoDiffHistoryChange>());
    entry.getDifferences().add(change);

    mongoTemplate.save(entry, getHistoryCollectionName(working.getClass()));
}

This is how it looks like in MongoDB:

{
  "_id" : ObjectId("5040a9e73c75ad7e3590e538"),
  "_class" : "MongoDiffHistoryEntry",
  "objectId" : "5034c7a83c75c52dddcbd554",
  "originalObject" : {
      BLABLABLA, including sections collection etc.
  },
  "differences" : [{
      "historyDate" : ISODate("2012-08-31T12:11:19.667Z"),
      "items" : [{
          "path" : "/sections[LetterSection@116a3de]",
          "state" : "ADDED",
          "modified" : {
            "_class" : "LetterSection",
            "_id" : ObjectId("5034c7a83c75c52dddcbd556"),
            "letterId" : "5034c7a83c75c52dddcbd554",
            "sectionIndex" : 2,
            "stringContent" : "BLABLA",
            "contentMimetype" : "text/plain",
            "sectionConfiguration" : "BLUBB"
          }
        }, {
          "path" : "/sections[LetterSection@19546ee]",
          "state" : "REMOVED",
          "base" : {
            "_class" : "LetterSection",
            "_id" : ObjectId("5034c7a83c75c52dddcbd556"),
            "letterId" : "5034c7a83c75c52dddcbd554",
            "sectionIndex" : 2,
            "stringContent" : "BLABLABLA",
            "contentMimetype" : "text/plain",
            "sectionConfiguration" : "BLUBB"
          }
        }]
    }, {
      "historyDate" : ISODate("2012-08-31T13:15:32.574Z"),
      "items" : [{
          "path" : "/sections[LetterSection@44a38a]/stringContent",
          "state" : "CHANGED",
          "base" : "blub5",
          "modified" : "blub6"
        }]
    },
    }],
  "deleted" : false
}

EDIT: Here is the Visitor code:

public class MongoDiffHistoryChangeVisitor implements Visitor {

private MongoDiffHistoryChange change;
private Object working;
private Object base;

public MongoDiffHistoryChangeVisitor(MongoDiffHistoryChange change, Object working, Object base) {
    this.change = change;
    this.working = working;
    this.base = base;
}

public void accept(Node node, Visit visit) {
    if (node.isRootNode() && !node.hasChanges() ||
        node.hasChanges() && node.getChildren().isEmpty()) {
        MongoDiffHistoryChangeItem diffItem = new MongoDiffHistoryChangeItem();
        diffItem.setPath(node.getPropertyPath().toString());
        diffItem.setState(node.getState());

        if (node.getState() != State.UNTOUCHED) {
            diffItem.setBase(node.canonicalGet(base));
            diffItem.setModified(node.canonicalGet(working));
        }

        if (change.getItems() == null)
            change.setItems(new ArrayList<MongoDiffHistoryChangeItem>());
        change.getItems().add(diffItem);
    }
}

}
Matthias Wuttke
  • 1,982
  • 2
  • 21
  • 38
8

We're using a base entity (where we set the Id, creation + last change dates,...). Building upon this we're using a generic persistence method, which looks something like this:

@Override
public <E extends BaseEntity> ObjectId persist(E entity) {
    delta(entity);
    mongoDataStore.save(entity);
    return entity.getId();
}

The delta method looks like this (I'll try to make this as generic as possible):

protected <E extends BaseEntity> void delta(E newEntity) {

    // If the entity is null or has no ID, it hasn't been persisted before,
    // so there's no delta to calculate
    if ((newEntity == null) || (newEntity.getId() == null)) {
        return;
    }

    // Get the original entity
    @SuppressWarnings("unchecked")
    E oldEntity = (E) mongoDataStore.get(newEntity.getClass(), newEntity.getId()); 

    // Ensure that the old entity isn't null
    if (oldEntity == null) {
        LOG.error("Tried to compare and persist null objects - this is not allowed");
        return;
    }

    // Get the current user and ensure it is not null
    String email = ...;

    // Calculate the difference
    // We need to fetch the fields from the parent entity as well as they
    // are not automatically fetched
    Field[] fields = ArrayUtils.addAll(newEntity.getClass().getDeclaredFields(),
            BaseEntity.class.getDeclaredFields());
    Object oldField = null;
    Object newField = null;
    StringBuilder delta = new StringBuilder();
    for (Field field : fields) {
        field.setAccessible(true); // We need to access private fields
        try {
            oldField = field.get(oldEntity);
            newField = field.get(newEntity);
        } catch (IllegalArgumentException e) {
            LOG.error("Bad argument given");
            e.printStackTrace();
        } catch (IllegalAccessException e) {
            LOG.error("Could not access the argument");
            e.printStackTrace();
        }
        if ((oldField != newField)
                && (((oldField != null) && !oldField.equals(newField)) || ((newField != null) && !newField
                        .equals(oldField)))) {
            delta.append(field.getName()).append(": [").append(oldField).append("] -> [")
                    .append(newField).append("]  ");
        }
    }

    // Persist the difference
    if (delta.length() == 0) {
        LOG.warn("The delta is empty - this should not happen");
    } else {
        DeltaEntity deltaEntity = new DeltaEntity(oldEntity.getClass().toString(),
                oldEntity.getId(), oldEntity.getUuid(), email, delta.toString());
        mongoDataStore.save(deltaEntity);
    }
    return;
}

Our delta entity looks like that (without the getters + setters, toString, hashCode, and equals):

@Entity(value = "delta", noClassnameStored = true)
public final class DeltaEntity extends BaseEntity {
    private static final long serialVersionUID = -2770175650780701908L;

    private String entityClass; // Do not call this className as Morphia will
                            // try to work some magic on this automatically
    private ObjectId entityId;
    private String entityUuid;
    private String userEmail;
    private String delta;

    public DeltaEntity() {
        super();
    }

    public DeltaEntity(final String entityClass, final ObjectId entityId, final String entityUuid,
            final String userEmail, final String delta) {
        this();
        this.entityClass = entityClass;
        this.entityId = entityId;
        this.entityUuid = entityUuid;
        this.userEmail = userEmail;
        this.delta = delta;
    }

Hope this helps you getting started :-)

xeraa
  • 10,456
  • 3
  • 33
  • 66
  • Thank you very much for the sample. I also found a post about java object diffs (http://stackoverflow.com/questions/8001400/is-there-a-java-library-that-can-diff-two-objects) mentioning this library: https://github.com/SQiShER/java-object-diff - maybe I can "spice up" your solution with this diff algorithm. I would like to leave this question open for some more time, maybe there are other ideas. – Matthias Wuttke Aug 31 '12 at 07:40
  • Interesting project, looking forward to your solution. An upvote would still be appreciated in the meantime ;-) – xeraa Aug 31 '12 at 09:03
4

looks like Javers is the right tool for this job, see http://javers.org/documentation/features/#javers-repository

Javers is conceptually a VCS for domain object versioning, backed by JSON and MongoDB

Bartek Walacik
  • 3,386
  • 1
  • 9
  • 14
  • 2
    Actually, I have to revise my earlier comment. I tried using Javers only to find that it is not feasible since it always constructs the current objects from the base version plus all changes, which makes read times about 20x as long as it would be if it simply stored the latest version of a document somewhere. And since getting the latest version of a document is sort of the primary use case, that sort of is a show-stopper in my opinion. – Kira Resari Jun 24 '20 at 08:25