In Gremlin how does map() really work?

Question

Why does these two yield different results?

graph.traversal()
   .V().map(__.out("contains"))
.valueMap(true).next(100)

compared to

graph.traversal()
   .V().out("contains")
.valueMap(true).next(100)

Why do I prefer map to directly calling the .out() method? This way I can organize my code where I can get traversals from methods and "map" to existing traversals.

score 6 · Accepted Answer · answered Jun 25 '18 at 10:49

In thinking about this issue, recall that Gremlin is a bit like a processing pipeline, where objects are pulled through each step in the pipeline to apply some transformation, filter, etc. So, given your example in its most simplistic form you would say that you are getting all vertices and traversing on out() edges, which means that you are comparing the following traversals and results:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().out()
==>v[3]
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]
gremlin> g.V().map(out())
==>v[3]
==>v[5]
==>v[3]

Those traversals return two different results because you are asking Gremlin for two different things. In the first case, out() is not a form of map(), it is a form of flatMap() which means that for each vertex traverser coming through the pipeline it will iterate all the outgoing edges and traverse to and return the adjacent vertex (i.e. one-to-many transform). In the second case, you are asking Gremlin to do a simple map() of a vertex to another object (i.e. one-to-one transform) which in this case would be the result of out() which is the first object in that traverser stream.

To demonstrate you could simply change map() to flatMap() as follows:

gremlin> g.V().flatMap(out())
==>v[3]
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]

or alternatively fold() the results of out() to a single object to maintain the one-to-one transform logic:

gremlin> g.V().map(out().fold())
==>[v[3],v[2],v[4]]
==>[]
==>[]
==>[v[5],v[3]]
==>[]
==>[v[3]]

Question is should I be organizing my code this way? Passing around traversals for, or using map and accepting traversals from other places in the code? — Ram, Jun 25 '18 at 17:02
I don't think it's wrong to construct anonymous traversals from functions and then use them in `map()` but it might make your Gremlin hard to read. Depending on exactly what your'e doing, you might also make it harder for query analyzers to properly optimize your traversals. If you have re-usable bits of Gremlin, then you might be better off building a DSL https://www.datastax.com/dev/blog/gremlin-dsls-in-java-with-dse-graph — stephen mallette, Jun 25 '18 at 17:21

In Gremlin how does map() really work?

1 Answers1