33

I am looking at integrating Neo4j into a Clojure system I am building. The first question I was asked was why I didn't use Datomic. Does anyone have a good answer for this? I have heard of and seen videos on Datomic, but I don't know enough about Graph Databases to know the difference between Neo4j and Datomic, and what difference it would make to me?

yazz.com
  • 57,320
  • 66
  • 234
  • 385

1 Answers1

59

There are a few fundamental difference between them:

Data Model

Both Neo4j and Datomic can model arbitrary relationships. They both use, effectively, an EAV (entity-attribute-value) schema so they both can model many of the same problem domains except Datomic's EAV schema also embeds a time dimension (i.e. EAVT) which makes it very powerful if you want to perform efficient queries against your database at arbitrary points in time. This is something that non-immutable data stores (Neo4j included) could simply not do.

Data Access

Both Neo4j and Datomic provide traversal APIs and query languages:

Queries

Both Neo4j and Datomic provide declarative query languages (Cypher and Datalog, respectively) that support recursive queries except Datomic's Datalog provides far superior querying capabilities by allowing custom filtering and aggregate functions to be implemented as arbitrary JVM code. In practice, this means Cypher's built-in functions can effectively be superseded by Clojure's sequence library. This is possible because your application, not the database, is the one running queries.

Traversal

Traversal APIs are always driven by application code, which means both Neo4j and Datomic are able to walk a graph using arbitrary traversal, filtering and data transformation code except Neo4j requires a running transaction which in practice means it's time-bounded.

Data Consistency

Another fundamental difference is that Datomic queries don't require database coordination (i.e. no read transactions) and they always work with a consistent data snapshot which means you could perform multiple queries and data transformations over an arbitrary period of time and guarantee your results will always be consistent and that no transaction will timeout (because there's none). Again, this is impossible to do in non-immutable data stores like the vast majority of existing databases (Neo4j included). This also applies to their traversal APIs.

Both Neo4j and Datomic are transactional (ACID) systems, but because Neo4j uses traditional interactive transactions -using optimistic concurrency controls-, queries need to happen inside transactions (need to be coordinated) which imposes timeout constraints to your queries. In practice, this means that for very complex, long-running queries, you'll end-up splitting your queries, so they finish within certain time limits, giving up data consistency.

Working Set

If for some reason your queries needed to involve a huge amount of data (more than it would normally fit in memory) and you couldn't stream the results (since Datomic provides streaming APIs), Datomic would probably not be a good fit since you wouldn't be taking advantage of Datomic's architecture, forcing peers to constantly evict their working memory, performing additional network calls and decompressing data segments.

a2ndrade
  • 2,403
  • 21
  • 19
  • Very well thought out description, thanks. Have you used both products? – yazz.com Jul 27 '13 at 17:01
  • 2
    @Zubair I've used Datomic. I'm familiar with Neo4j. As a side-note, look at https://github.com/datablend/blueprints, which is a set of graph interfaces usually implemented by graph databases to showcase some of their capabilities. Both Neo4j and Datomic implementations are there (although the Datomic implementation uses Java, not Clojure, so some things are not idiomatic). – a2ndrade Jul 27 '13 at 17:38
  • 1
    Just a comment on your description: "custom filtering and aggregate functions" can be achieved too with Neo4J. Cypher isn't the only way to query data (esp. in the JVM world), the traversal framework allows you to write *any* code to retrieve data and you can always fall back to lower-level APIs to achieve even more fine-grained retrievals. – fbiville Jul 28 '13 at 10:42
  • 1
    And about data consistency, you are absolutely right. This fundamental difference is very well illustrated in this Rich Hickey talk: http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey. – fbiville Jul 28 '13 at 10:44
  • @Rolf what I meant to say was that Datomic's Datalog was far superior than Cypher (due to the custom functions) and that both offered a traversal API. I recognized it was poorly written so I updated it, thank you. You're right that you can use arbitrary code with the traversal API but that's because traversal is _always_ driven by application code, which _always_ runs locally. This is probably true for any other graph database. Again, I recognized it was poorly written so I updated it too. Thank you. – a2ndrade Jul 28 '13 at 14:14
  • Alright, I probably misunderstood what I first read ;) However, be aware that Cypher, although evolving pretty fast, is still a very young language. – fbiville Jul 28 '13 at 18:07
  • 2
    This is a very well thought-out answer, but for completeness, I would like to see someone with extensive experience in Neo4j weigh in. The accepted answer seems biased toward Datomic. – Ben Aug 08 '15 at 23:29
  • Is it still relevant today (after 5 years) ? – Piyush Katariya May 19 '18 at 07:40
  • FYI, datomic has recently added the [qseq](https://docs.datomic.com/cloud/query/query-executing.html#qseq) api that facilitates working with larger working sets – Erich Oliphant Jul 19 '20 at 18:07