0

I have a source of data which I read and try to load it into my CosmosDB Graph. Each row that I fetch contains an information about multiple entities (person, software). What I am trying to do here is to:

  • verify if such a vertex(es) already exist and generate a separate entity for missing entries (person, software)
  • verify if an edge already exists (between this person and this software)
  • create an edge between them

I've been making a reference to the following topics: CosmosDB Graph : “upsert” query pattern, Add edge if not exist using gremlin trying to combine them somehow but without much of a success.

I have tried the following:

g.V().has('person','name','vadas').
  fold().coalesce(unfold(), addV('person').property('name','vadas')).as('v').
  V().has('software', 'name','ripple').
  fold().coalesce(unfold(), addV('software').property('name','ripple')).
  coalesce(__.inE('created').where(outV().as('v')), addE('created').from('v'))

but it only creates the vertices without the edge between them.

I am also wondering if there's a bit more common approach to kind of:

1. Upsert entity A and keep a reference to it
2. Upsert entity B and keep a reference to it
3. Upsert entity C and keep a reference to it
....
1. Upsert edge between A and B
2. Upsert edge between A and C
user2128702
  • 2,059
  • 2
  • 29
  • 74

1 Answers1

1

You have a reducing barrier step (i.e. fold()) that comes between you step label of "v" and where you try to retrieve it in where(). Note what happens more explicitly in the example below:

gremlin> g.V().has('person','name','vadas').
......1>   fold().coalesce(unfold(), addV('person').property('name','vadas')).as('v').
......2>   V().has('software', 'name','ripple').
......3>   fold()
==>[v[5]]
gremlin> g.V().has('person','name','vadas').
......1>   fold().coalesce(unfold(), addV('person').property('name','vadas')).as('v').
......2>   V().has('software', 'name','ripple').
......3>   fold().select('v')
gremlin>

As you can see, you can't select('v'). The path history for that traverser is gone. The history is lost because you've reduced the stream from many traversers to one so "v" loses context in that merge.

When this sort of thing happens you typically just need to re-write your traversal a bit. I might do something like this in your case as it incurs the least change and maintains readability nicely:

gremlin> g.V().has('person','name','vadas').
......1>   fold().
......2>   coalesce(unfold(), addV('person').property('name','vadas')).
......3>   V().has('software', 'name','ripple').
......4>   fold().
......5>   coalesce(unfold(), addV('software').property('name','ripple')).
......6>   coalesce(__.inE('created').where(outV().has('person','name','vadas')), 
......7>            addE('created').from(V().has('person','name','vadas')))
==>e[24][2-created->5]

UPDATE: As of TinkerPop 3.6.0, the fold()/coalesce()/unfold() pattern has been largely replaced by the new steps of mergeV() and mergeE() which greatly simplify the Gremlin required to do an upsert-like operation. Under 3.6.0 and newer versions, you would write:

gremlin> g.mergeV([(label): 'person', name: 'vadas']).as('vadas').
......1>   mergeV([(label): 'software', name: 'ripple']).as('ripple').
......2>   mergeE([(from):outV, (to): inV, label: 'created']).
......3>     option(outV, select('vadas')).
......4>     option(inV, select('ripple'))
==>e[0][2-edge->5]

It's even nicer if you can use vertex identifiers thereby limiting the search criteria for mergeV() strictly to the T.id and avoiding late binding of vertices for mergeE():

gremlin> g.mergeV([(id): 2]).
......1>     option(onCreate, [(label): 'person', name: 'vadas']).as('vadas').
......2>   mergeV([(id): 5]).
......3>     option(onCreate, [(label): 'software', name: 'ripple']).as('ripple').
......4>   mergeE([(from): 2, (to): 5, label: 'created'])
==>e[0][2-edge->5]
stephen mallette
  • 45,298
  • 5
  • 67
  • 135