0

I'm trying to write a Java wrapper that will help me upsert (or even just insert) vertices in a Gremlin server.

I realize that different tools may support other methods, e.g. AWS Neptune (which is my primary target backend) has a bulk loading API that reads data from S3, but I'm trying to avoid things that are only supported by one implementation as my requirements may evolve to include supporting other backends (and I'd like to support at least TinkerGraph, for unit testing). I don't plan on upserting many vertices per batch; this is not for populating a huge graph from scratch, this is for adding to an existing graph from a stream of data.

The Tinkerpop documentation mentions addVertex but discourages its use, recommending g.addV() instead. My problem with g.addV(), and the Gremlin DSL in general, is that although it's not too hard to write one-off queries with it (... assuming you understand Gremlin well, which I don't), is that I can't figure out how to build queries dynamically.

Given as input a set of vertices and their properties, with arbitrary cardinality (let's say it comes from a CSV file of anywhere between 1-100 lines), how do I dynamically build a graph traversal that upserts all of the vertices? Alternate question: Is there a backend-agnostic tool for loading a few hundred vertices' worth of data in a local or remote graph?

This seems to be a Java issue as much as a Gremlin one. I have tried building a graph traversal by composing functions with Function#andThen(Function) but I am quickly running into issues with Java's generics because graph traversal methods return GraphTraversal<S, E> where both S and E depend on the actual graph traversal method. E.g. addV() returns GraphTraversal<S, Vertex> and addE() returns GraphTraversal<S, Edge>.

Fabrice Gabolde
  • 215
  • 2
  • 8

1 Answers1

2

In Java (or any GLV), you can build up a query in code and then submit it by appending a Terminal [1] step:

query = g.addV('test').property(id,'v1')

query = query.addV('test').property(id,'v2')
query = query.addV('test').property(id,'v3')
// so on and so forth - or you can do this in a loop

// and then submit it with
result = query.iterate()

There are other (perhaps more clever) methods of doing this with an injected map:

g.inject([
    [ id: 'v347', label: 'test', name: 'Son' ],
    [ id: 'v348', label: 'test', name: 'Messi' ],
    [ id: 'v349', label: 'test', name: 'Suarez' ],
    [ id: 'v350', label: 'test', name: 'Kane' ]
]).unfold().
   addV(select('label')).
   property(id,select('id')).
   property('name',select('name'))

TinkerPop 3.6 also brings forth the mergeV() and mergeE() steps [2]. Although, at the time of this writing, Neptune only supports up to TinkerPop 3.5.3 in the latest Neptune engine (3.6 is forth coming).

[1] https://tinkerpop.apache.org/docs/current/reference/#terminal-steps

[2] https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step

Taylor Riggan
  • 1,963
  • 6
  • 12
  • Looking forward to mergeV being available in Neptune as the fold/coalesce/unfold/addV idiom is fairly cumbersome. Your second option with inject() is interesting and it might be easier to build from this than the first one which is what I'd tried. The problem with it is that the type of `query` changes if I try to do for instance addV and addE within the same graph traversal. – Fabrice Gabolde Nov 24 '22 at 14:34
  • In the end this answer allowed me to rethink my approach, simplify my code, and just modify a graph traversal repeatedly until all the addV calls were chained. Before this I had gotten mired in a version with function composition. The code became much simpler once I took it out. There is still an issue where I tried to return the resulting graph traversal (to hand it off to my query executor object), because the traversal's full type is not known in advance, depending on if it ends by adding an edge or a vertex. – Fabrice Gabolde Nov 28 '22 at 16:09