124

I'm starting to develop with Neo4j using the REST API. I saw that there are two options for performing complex queries - Cypher (Neo4j's query language) and Gremlin (the general purpose graph query/traversal language).

Here's what I want to know - is there any query or operation that can be done by using Gremlin and can't be done with Cypher? or vice versa?

Cypher seems much more clear to me than Gremlin, and in general it seems that the guys in Neo4j are going with Cypher. But - if Cypher is limited compared to Gremlin - I would really like to know that in advance.

Taja Jan
  • 942
  • 1
  • 1
  • 11
Rubinsh
  • 4,883
  • 10
  • 34
  • 41
  • 3
    Cypher is a non-turing complete declarative language. Gremlin is a fancy wrapper over Neo4j Java API and is imperative. Clearly, there are things possible to do in gremlin that you cannot in cypher. – Prakhar Agrawal Aug 27 '17 at 19:10
  • 1
    Apache Spark 3 will include Cypher, which says a lot about their view on that. – Walker Rowe Mar 15 '19 at 15:15
  • @PrakharAgrawal Gremlin allows both imperative and declarative styles. For example, the `match()` step is declarative https://tinkerpop.apache.org/docs/3.5.2/reference/#match-step – jbmusso Mar 19 '22 at 14:07

9 Answers9

99

For general querying, Cypher is enough and is probably faster. The advantage of Gremlin over Cypher is when you get into high level traversing. In Gremlin, you can better define the exact traversal pattern (or your own algorithms) whereas in Cypher the engine tries to find the best traversing solution itself.

I personally use Cypher because of its simplicity and, to date, I have not had any situations where I had to use Gremlin (except working with Gremlin graphML import/export functions). I expect, however, that even if i would need to use Gremlin, I would do so for a specific query I would find on the net and never come back to again.

You can always learn Cypher really fast (in days) and then continue with the (longer-run) general Gremlin.

Ambrose Leung
  • 3,704
  • 2
  • 25
  • 36
ulkas
  • 5,748
  • 5
  • 33
  • 47
  • 4
    There is a new online tutorial beginning at http://www.neo4j.org/learn/cypher for you to get going also. – Peter Neubauer Dec 12 '12 at 12:47
  • 3
    I had the understanding that Cypher was more like SQL, in that you tell it what you want, and it works out how to do it. With Gremlin, you issue exacts traversal commands, which it must obey. – Stewart Feb 08 '13 at 12:49
  • 2
    For me Gremlin happened to be significantly faster than Cypher in most of queries. – Joan Jun 10 '14 at 14:19
  • 13
    As of [TinkerPop 3.x](http://tinkerpop.apache.org/), Gremlin has both imperative and declarative characteristics. You can write your traversals to define an exact traversal pattern as stated in this answer or you can use [match step](http://tinkerpop.apache.org/docs/3.1.1-incubating/reference/#match-step) to simply define the pattern you are looking for and Gremlin will solve for that. – stephen mallette Apr 05 '16 at 11:58
  • 1
    There is always the option to write a Cypher extension to alleviate Cypher limitations. APOC (https://github.com/neo4j-contrib/neo4j-apoc-procedures) e.g. offers a nice collection of extensions. Authoring one is quite straighforward: https://neo4j.com/docs/java-reference/current/extending-neo4j/procedures-and-functions/introduction/#extending-neo4j-procedures-and-functions-introduction – fbiville Jan 19 '21 at 11:18
51

We have to traverse thousands of nodes in our queries. Cypher was slow. Neo4j team told us that implementing our algorithm directly against the Java API would be 100-200 times faster. We did so and got easily factor 60 out of it. As of now we have no single Cypher query in our system due to lack of confidence. Easy Cypher queries are easy to write in Java, complex queries won't perform. The problem is when you have multiple conditions in your query there is no way in Cypher to tell in which order to perform the traversals. So your cypher query may go wild into the graph in a wrong direction first. I have not done much with Gremlin, but I could imagine you get much more execution control with Gremlin.

Heinrich
  • 546
  • 4
  • 4
  • When you say "directly against the Java API" do you mean Neo4j embedded in Java? – Pavel Jul 11 '14 at 02:22
  • 2
    Using server extensions within neo4j installed as a standalone server. – Heinrich Jul 15 '14 at 22:34
  • 15
    Update from 2018 - given a large range of native index types in modern versions of neo4j, this answer is substantially out of date; neo4j has published performance numbers – FrobberOfBits Aug 01 '18 at 18:01
  • 5
    "implementing our algorithm directly against the Java API" is actually a little bit misleading. Obviously, the fastest way to get from point A to point B is to take the shortest path. That requires knowing additional, specific, information. Going low level will always outperform a machine planner, because you know you can make assumptions the machine can't. However, Cypher can easily outperform a naively implemented low-level algorithm, requires a lot less knowledge to use, and is much faster to implement. Especially since Cypher gets better with each Neo4j release. (smarter planners) – Tezra Oct 18 '18 at 13:29
28

The Neo4j team's efforts on Cypher have been really impressive, and it's come a long way. The Neo team typically pushes people toward it, and as Cypher matures, Gremlin will probably get less attention. Cypher is a good long-term choice.

That said- Gremlin is a Groovy DSL. Using it through its Neo4j REST endpoint allows full, unfettered access to the underlying Neo4j Java API. It (and other script plugins in the same category) cannot be matched in terms of low-level power. Plus, you can run Cypher from within the Gremlin plugin.

Either way, there's a sane upgrade path where you learn both. I'd go with the one that gets you up and running faster. In my projects, I typically use Gremlin and then call Cypher (from within Gremlin or not) when I need tabular results or expressive pattern matching- both are a pain in the Gremlin DSL.

Matt Luongo
  • 14,371
  • 6
  • 53
  • 64
  • 1
    Note that as of 2022, Gremlin Groovy is one of the many language variants. Gremlin queries can be created and executed from multiple languages, including Python, JavaScript, C# and Java. https://tinkerpop.apache.org/docs/3.5.2/reference/#gremlin-drivers-variants - Groovy used to be the main and default implementation, but that's no longer the case. – jbmusso Mar 19 '22 at 16:29
20

I initially started using Gremlin. However, at the time, the REST interface was a little unstable, so I switched to Cypher. It has much better support for Neo4j. However, there are some types of queries that are simply not possible with Cypher, or where Cypher can't quite optimize the way you can with Gremlin.

Gremlin is built over Groovy, so you can actually use it as a generic way to get Neo4j to execute 'Java' code and perform various tasks from the server, without having to take the HTTP hit from the REST interface. Among others, Gremlin will let you modify data.

However, when all I want is to query data, I go with Cypher as it is more readable and easier to maintain. Gremlin is the fallback when a limitation is reached.

Louis-Philippe Huberdeau
  • 5,341
  • 1
  • 19
  • 22
  • 1
    Cypher has support for updating queries as of Neo4j 1.7, see http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html – Peter Neubauer Dec 12 '12 at 12:48
  • 3
    Note that the REST interface will be going away in TinkerPop 3. Users will be expected to send strings of Gremlin to Gremlin Server (which is basically Rexster, renamed and improved). – jbmusso Sep 25 '14 at 16:05
11

Gremlin queries can be generated programmatically. (See http://docs.sqlalchemy.org/en/rel_0_7/core/tutorial.html#intro-to-generative-selects to know what I mean.) This seems to be a bit more tricky with Cypher.

Tohotom
  • 146
  • 1
  • 4
  • @MattLuongo: 1, I did not know about neo4django, 2, it is not applicable in all cases (e.g. language is not Python) 3, it is not the same if you write the query programmatically yourself or you use a library to create the query programmatically for you. In this respect neo4django can be considered an alternative solution to Cypher and Gremlin. – Tohotom Feb 17 '14 at 14:52
  • 3
    Oh, of course I don't expect neo4django to be immediately applicable; it was an example, just as SQL Alchemy was in your answer. But it's not true that generating Cypher is *more* difficult. Cypher and Gremlin take different approaches as query languages, but I don't see how Cypher is any harder to generate programmatically... – Matt Luongo Feb 17 '14 at 15:19
9

Cypher is a declarative query language for querying graph databases. The term declarative is important because is a different way of programming than programming paradigms like imperative.

In a declarative query language like Cypher and SQL we tell the underlying engine what data we want to fetch and we do not specify how we want the data to be fetched.

In Cypher a user defines a sub graph of interest in the MATCH clause. Then underlying engine runs a pattern matching algorithm to search for the similar occurrences of sub graph in the graph database.

Gremlin is both declarative and imperative features. It is a graph traversal language where a user has to give explicit instructions as to how the graph is to be navigated.

The difference between these languages in this case is that in Cypher we can use a Kleene star operator to find paths between any two given nodes in a graph database. In Gremlin however we will have to explicitly define all such paths. But we can use a repeat operator in Gremlin to find multiple occurrences of such explicit paths in a graph database. However, doing iterations over explicit structures in not possible in Cypher.

Brian Burns
  • 20,575
  • 8
  • 83
  • 77
Chandan Sharma
  • 121
  • 1
  • 3
9

Cypher only works for simple queries. When you start incorporating complex business logic into your graph traversals it becomes prohibitively slow or stops working altogether.

Neo4J clearly knows that Cypher isn't cutting it, because they also provide the APOC procedures which include an alternate path expander (apoc.path.expand, apoc.path.subgraphAll, etc).

Gremlin is harder to learn but it's more powerful than Cypher and APOC. You can implement any logic you can think of in Gremlin.

I really wish Neo4J shipped with a toggleable Gremlin server (from reading around, this used to be the case). You can get Gremlin running against a live Neo4J instance, but it involves jumping through a lot of hoops. My hope is that since Neo4J's competitors are allowing Gremlin as an option, Neo4J will follow suit.

user1302130
  • 456
  • 5
  • 9
  • 1
    neo4j being the most popular graph DB in the world, I think there might be a reason why they haven't adopted gremlin yet. – Luk Aron Nov 21 '19 at 07:11
  • 7
    since you don't share what those reasons might be, I don't see any value in your comment – user1302130 Nov 21 '19 at 18:38
5

If you use gremlin, then it allow you to migrate the to different graph databases, Since most of the graph databases supports the gremlin traversal, Its good idea to chose the gremlin.

Singaravelan
  • 57
  • 1
  • 2
3

Long answer short : Use cypher for query and gremlin for traversal. You will see the response timing yourself.