10

I'm using cosmos graph db in azure.

Does anyone know if there is a way to add an edge between two vertex only if it doesn't exist (using gremlin graph query)?

I can do that when adding a vertex, but not with edges. I took the code to do so from here:

g.Inject(0).coalesce(__.V().has('id', 'idOne'), addV('User').property('id', 'idOne'))

Thanks!

3 Answers3

24

It is possible to do with edges. The pattern is conceptually the same as vertices and centers around coalesce(). Using the "modern" TinkerPop toy graph to demonstrate:

gremlin> g.V().has('person','name','vadas').as('v').
           V().has('software','name','ripple').
           coalesce(__.inE('created').where(outV().as('v')),
                    addE('created').from('v').property('weight',0.5))
==>e[13][2-created->5]

Here we add an edge between "vadas" and "ripple" but only if it doesn't exist already. the key here is the check in the first argument to coalesce().

UPDATE: As of TinkerPop 3.6.0, the fold()/coalesce()/unfold() pattern has been largely [replaced by the new steps][3] of mergeV() and mergeE() which greatly simplify the Gremlin required to do an upsert-like operation. Under 3.6.0 and newer versions, you would write:

gremlin> g.V().has('person','name','vadas').as('v2').
......1>   V().has('software','name','ripple').as('v5').
......2>   mergeE([(from):outV, (to): inV, label: 'created']).
......3>     option(onCreate, [weight: 0.5]).
......4>     option(outV, select('v2')).
......5>     option(inV, select('v5'))
==>e[13][2-edge->5]

You could also do this with id value if you know them which makes it even easier:

gremlin> g.mergeE([(from): 2, (to): 5, label: 'created']).
......1>     option(onCreate, [weight: 0.5])
==>e[13][2-edge->5]
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • 1
    Hi Stephen, I don't quite get how the query works. What is the point of second `.as('v')` in the query? As I understand, `__.inE('created').where(outV().as('v'))` returns any edges that has label `created` from the vertex `ripple` to another (can be any node), and then label this node as `v`. So if this node is different than node `vadas`, would the original query create an edge between `vadas` and `ripple`? – Hieu Nguyen Sep 24 '19 at 01:41
  • 2
    well, there is a special syntax for the `as('v')` when use in the fashion i've used it. when used inside of `where()`it's effectively filtering "vadas" (the value at the step labeled "v") against what is in `outV()` for the "created" edge. so the query checks for existence of the edge between "vadas" and "ripple" and returns it, or if not present, it creates it. – stephen mallette Sep 24 '19 at 10:45
  • 1
    Thanks Stephen. It is really nice to know about this. As you mentioned, this syntax of using `.as('v')` is quite special. Is it mentioned somewhere in the documentation (or any other document)? The description here http://tinkerpop.apache.org/docs/current/reference/#as-step does not mention it. – Hieu Nguyen Sep 24 '19 at 17:10
  • 1
    i looked earlier in the reference documentation and didn't see it either :/ there are examples though in the more "advanced Gremlin" tutorials of recipes - for instance here: http://tinkerpop.apache.org/docs/current/recipes/#betweeness-centrality – stephen mallette Sep 24 '19 at 17:44
  • the use of __.inE(...) is an expensive query, especially with a large number of connections. I've posted an answer that uses vertex lookups instead. – Alexander Aavang Mar 30 '20 at 15:30
  • How can we make changes in the query to change the properties that are attributes if the edge already exists ? – Phoenix Sep 03 '20 at 12:56
  • just add `property()` to the first traversal argument provided to the `coalesce()` – stephen mallette Sep 03 '20 at 13:48
  • I'm trying this out using nodejs and I keep getting this error: `Server error: Neither the map, sideEffects, nor path has a v-key: WhereEndStep(v) (500)`. What could be the problem? I'm using your query except that I don't need the `weight` property. – Uche Ozoemena Apr 13 '22 at 14:42
0

The performance of the accepted answer isn't great since it use inE(...), which is an expensive operation.

This query is what I use for my work in CosmosDB:

g.E(edgeId).
fold().
coalesce(
   unfold(),
   g.V(sourceId).
   has('pk', sourcePk).
   as('source').
   V(destinationId).
   has('pk', destinationPk).
   addE(edgeLabel).
   from('source').
   property(T.id, edgeId)
)

This uses the id and partition keys of each vertex for cheap lookups.

Alexander Aavang
  • 209
  • 3
  • 11
  • 3
    this is a good approach (though i'd prefer the use of `__` for spawning an anonymous traversal rather than "g"), but it assumes you know the edge id. Perhaps that is a common case for CosmosDB but would be unlikely for other graph databases. Moreover, depending on the graph database you use, edge id look-ups end up being translated to a vertex lookup first followed by lookup for the associated edge, so this approach would involve looking up a Vertex twice. Anyway, thanks for the additional answer that might be helpful to CosmosDB users. – stephen mallette Mar 30 '20 at 15:49
0

I have been working on similar issues, trying to avoid duplication of vertices or edges. The first is a rough example of how I check to make sure I am not duplicating a vertex:

            "g.V().has('word', 'name', '%s').fold()"
            ".coalesce(unfold(),"
            "addV('word')" 
            ".property('name', '%s')"
            ".property('pos', '%s')"
            ".property('pk', 'pk'))"
            % (re.escape(category_),re.escape(category_), re.escape(pos_))

The second one is the way I can make sure that isn't a directional edge in either direction. I make use of two coalesce statements, one nested inside the other:

        "x = g.V().has('word', 'name', '%s').next()\n"
        "y = g.V().has('word', 'name', '%s').next()\n"
        "g.V(y).bothE('distance').has('weight', %f).fold()"
        ".coalesce("
        "unfold(),"
        "g.addE('distance').from(x).to(y).property('weight', %f)"
        ")"
        % (word_1, word_2, weight, weight)

So, if the edge exists y -> x, it skips producing another one. If y -> x doesn't exist, then it tests to see if x -> y exists. If not, then it goes to the final option of creating x -> y

Let me know if anyone here knows of a more concise solution. I am still very new to gremlin, and would love a cleaner answer. Though, this one appears to suffice.

When I implemented the previous solutions provided, when I ran my code twice, it produced an edge for each try, because it only tests one direction before creating a new edge.

Adam Boyle
  • 99
  • 1
  • 5
  • ```` g.V(). has('Person', 'name', 'Ben').fold(). coalesce(unfold().property('age', 25), __.addV('Person').property('name','Ben').property('age',25)).store('start'). V().has('Person','name','Robert').fold(). coalesce(unfold().property('age', 41), __.addV('Person').property('name','Robert').property('age',41)). coalesce(__.outE('link').has('id', 3).property('weight', 11), __.addE('link').property('id', 3).property('weight', 10).to(select("start").unfold())) ```` Add/Update 2 Vertexes and an Edge and associate them at the same time – Robert Green MBA Apr 12 '22 at 19:57