0

I'm employing multi-threaded transactions as described by JanusGraph docs. Each of my threads contributes to building a directory tree. Before inserting a new vertex for a specific directory, each thread first checks if such a vertex already exists within the same query. Vertexes are only inserted with .orElseGet if no existing one can be found.

Vertex vertex = graph.traversal().V()
    .hasLabel(VertexLabels.DIRECTORY)
    .has(PropertyKeys.PATH, directory.path())
    .tryNext()
    .orElseGet(() -> {
        return graph.addVertex(
            T.label, VertexLabels.DIRECTORY,
            PropertyKeys.PATH, directory.path());
    });

Technically, this should prevent duplicates assuming that all threads operate within the same transactional scope. I do however encounter duplicates. The docs don't seem to give any answers regarding this issue. Can you confirm whether multi-threaded transactions operate within the same scope?

Double M
  • 1,449
  • 1
  • 12
  • 29

1 Answers1

3

Multi-threaded transactions operate in the same scope, but I suppose it remains possible for the threads to race if you haven't configured a unique constraint on PropertyKeys.PATH. Doing so does mean that locking would be enabled which might slow down your ingestion rate but will ensure uniqueness.

As a side note please consider avoiding use of the Graph API (graph.addVertex()) and sticking to pure Gremlin - the "get or create" pattern is described here.

stephen mallette
  • 45,298
  • 5
  • 67
  • 135