23

I am out of ideas and hope to get some useful input. I am using this question to compress my experiences and share them, hoping to inspire some distributors to go the next step with modeling graph databases as a first class question/way.

I've been validating some graph database solutions usable by node.js for a few weeks. My use case is to save interactions of different social user network accounts. The need is to use CPU and memory in the most efficient way.

My most important requirements are:

  • in_memory (at least for indexing)
  • open source (and free to use)
  • same JavaScript/Node.js performance as first class citizen
  • comfortable query and modeling language

Neo4J

I really like cypher so my best choice would be Neo4j. But the major issue about Neo4j is the JavaScript access is non-native. It uses the REST-API which is about ten times (10x) slower than direct Java access. So I took a look at node-neo4j-embedded, but it has been inactive for more than two years. It looks like its author isn't active at all (bad sign).

ArangoDB

The really nice core developers of ArangoDB answered to my question about internals. Finally it means JavaScript is first class citizen because native queries can be pushed out of JS. Looking at the open source benchmarks, I think it is fair. But I am afraid they didn't use node-neo4j-embedded for their benchmark. The benchmarks compare the REST-APIs (Edited because of @weinberger comment). I wished they compare the native APIs (maybe someone is snoopy enough and give it a try! - let us know!). Update: As I noticed now, OrientDB has answered the benchmark with a new node.js driver (using Command Cache by starting the server with -Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3, what isn't fair, because it wasn't a query caches benchmark!)

Because I like to use ArangoDB as a graph database I would have 3 choices (source: FAQ):

In general it isn't comfortable like cypher. And I am not sure how to compare and what is the right way modeling data (like Neo4J explains very well). I'd love to have something like this for ArangoDB Graphs. It feels like ArangoDB is focused on graph operations and Neo4J fits more the needs of using graphs if you have more relations than rows (the reason to use graphs instead of relations with joins).

MongoDB

The document based MongoDB isn't optimized for graph operations but latterly has gotten an experimental in_memory storage engine. Also there are some projects either in_memory or graph related but nothing is really compelling. And at this discussion it looks like MongoDB isn't what I like to use.

OrientDB

Because there is a comparison about OrientDB vs. MongoDB available (from OrientDB) I though about to use this one. "OrientDB has a hybrid Document-Graph engine" using SQL. I am a former PHP/MySQL expert. But where is the modeling part ? Their chapter working with graphs is not cypher like. It is like using SQL for Graphs. There is nothing wrong with that, but using cypher before I miss the modeling like feeling. If someone did a modeling process with OrientDB and Graphs maybe you could write a tutorial like Neo4J had done.

Update: About JavaScript access like first citizen there are news: "In the next release the speed of this driver will be comparable to the native Java one" The forked node.js driver had bin fixed last days.

Update: Before choosing OrientDB one might want to read article about some issues and discussions linked from there. The article is touching a sensitive issue and should be approached with critical mind. Note from author of this update: I'm new to editing SO and don't have enough reputation to put this to comments. I believe this information is a valid point to discussion, not sure how to place it here according to SO rules.

LokiJS

Before I was looking at Neo4J, ArangoDB and MongoDB, I played around with that JavaScript based in_memory database called LokiJS, what seams to follow the strategy to ignore everything what slows down performance and efficiency. LokiJS is trying to complete the Mongo-Style (RoadMap). The major issue is the bad ability to scale. Of cause it isn't a graph database but it was an interesting solution while the beginning of my project. Also it wasn't a perfect feeling to find all the distributed documentation (maybe they should reboot with GitBook). Finally LokiJS is a very interesting project at all and I hope they will go forward!

LevelDB

Previously when I wrote my degree paper I was looking at levelDB. Remembering this while writing this post, I searched for LevelDB in_memory and got a promising result called MemDown (see also). I haven't tested this find, but maybe someone has experiences working and modeling for this solution. Maybe it would be the most efficient way if all the others will not fit because I would simply write a lightweight cypher clone with the goal to stay much lightweight as I can do.

Edit: Due to comment, here is a link to LevelGraph. As an idea to implement a CYPHER parser for LevelGraph/LevelDB your starting point would be to compare

Cypher:

CREATE (SUBJECT:"a") - [b:PREDICATE] -> (OBJECT:"c") 
RETURN, subject, predicate, object

LevelGraph:

var RETURN = { SUBJECT: "a", PREDICATE: "b", OBJECT: "c" };
db.put(RETURN, function(err) {
  // ..
});

Conclusion

As you likely noticed I am not the super hero about graphs. But this is my initial dive into this and I'm trying to get an overview. I assume there are a lot people out there who want to ask the same questions as me but haven't the time. I hope this post will help a lot people and will change by comments and answers to become a well done overview how to modeling data for graphs.


@editors: You are welcome.

@commenters: This is the result of my personal research - if you also have done a journey like me, please answer with a short summary like I have done for each DB I've evaluated (don't forget to target my 4 goals).

Community
  • 1
  • 1
Danny
  • 1,078
  • 7
  • 22
  • 1
    while closing all my browser tabs I stumbled over redis-graph: https://www.npmjs.com/package/redis-graph - any experiences out there ? – Danny Jul 22 '15 at 14:04
  • 5
    I'm from ArangoDB, so I can only answer the questions about ArangoDB. Concerning the benchmark: We didn't tests node-neo4j-embedded because it is inactive, the recommended driver is "neo4j", see http://j.mp/1HRIFwz and we wanted a client/server setup. The communication between node and ArangoDB is always HTTP. We used CYPHER for Neo4J and AQL for ArangoDB - the communication is HTTP in both cases. I agree that CYPHER is more concise than the current AQL. We will add syntactic sugar to make queries much easier. The data modeling aspect is best discussed using email, claudius at arangodb.com – weinberger Jul 22 '15 at 15:43
  • 1
    @weinberger: I've edited the post to be fair. Thanks for your feedback. You are right about to make this difference. As I added also I would love to see the comparison with node-neo4j-embedded and direct AQL queries out of node.js. – Danny Jul 23 '15 at 06:15
  • 2
    We will see if we can test your environment. You should also mention that a query cache is not part of the benchmark definition. Other products have also a query cache which was not used. If I use the query cache in ArangoDB for example the aggregation is only 2ms for the second call. We also updated our last post about the benchmark. – weinberger Jul 23 '15 at 10:10
  • 1
    @weinberger: YES, you are absolutely right! Using query caches isn't fair (also with pointing on). I have **added a notice** to the post. – Danny Jul 23 '15 at 12:52
  • @weinberger: About modeling data, I read http://radar.oreilly.com/2015/07/data-modeling-with-multi-model-databases.html - Base on this and thinking about OrientDBs SQL solution to work with Multi-Models the Cypher-Solution of Neo4J should be the measuring stuff to do a BIG semantic sugger for an very important reason: Students still learn ERD ... And academics have to teach non-academic programmers. Simple entry ideas WIN ever. What you provide in deeper layers is for professionals (and support). – Danny Jul 23 '15 at 16:06
  • This will likely be closed as off topic (read help pages for why) but if you want to make a comprehensive comparison you should include Titan and perhaps Oracle Graph, some RDF engines (Jena, Sesame, Allegro, Virtuoso) and maybe VertexDB and InfiniteGraph as well. – jjaderberg Jul 23 '15 at 17:39
  • @jjaderberg: Thanks for your grace note ! - It sounds like you are experienced. Please write an answer like I have created my question. This would complete the list of solutions. Don't forget to target my 4 major wants (and cypher like RDF-Modeling). @ all this also would be welcome to be done by other experts. – Danny Jul 23 '15 at 20:41
  • 2
    I'm currently working with Neo4j and the neo4j-io module ... from my perspective the performance even with the Rest overhead is pretty awesome. You should provide some requirements or metrics or any other numbers one the assumed usage before you state that sth. is too slow. – pagid Jul 24 '15 at 18:50
  • +1 @pagid https://www.npmjs.com/package/neo4j-io looks interesting with promises - cool! But my intend to use "native" by NodeJS is to prevent unnecessarily bottlenecks. For most common used cases you would be right the REST-API is awesome. But NodeJS wouldn't be able to beat Java that way. Especially if you like to use it with Tessel 2 for example. – Danny Jul 24 '15 at 20:02
  • 2
    Have you looked at [levelgraph](https://www.npmjs.com/package/levelgraph), yet? Sounds like a treat. But I personally don't like the query API. But this is rather me, being to new to the topic. – eljefedelrodeodeljefe Sep 14 '15 at 11:42
  • 1
    @eljefedelrodeodeljefe YES, that is what I meant while talking about LevelDB with my paper work. If you need to model your data like CYPHER, you simple say the (SUBJECT) - [:PREDICATE] -> (OBJECT) .... In the case you will write a node_module, please link it here. If someone pay me, and allow to keep it MIT licensed, I would do ;) – Danny Sep 16 '15 at 12:03
  • @Danny: Did you try [AQL traversals](https://www.arangodb.com/2016/01/arangodb-2-8/) new in ArangoDB 2.8? – CodeManX Apr 17 '16 at 14:40

2 Answers2

2

The idea to combine node-style performance through any of the native features (e.g. streams) and a high level query language like CYPHER is actually quite neat.

What you likely won't get is any kind of low level API, since this is rather rare with DB authors and, supposedly, not wanted in their design patterns. So, long running tcp connections shall just serve fine.

cypher-stream since to incorporate all of this, while (superficially judged) maintaining a good style.

Since you likely won't get any further with the search, I'd suggest sending him a pull request if any other features are needed :)

eljefedelrodeodeljefe
  • 6,304
  • 7
  • 29
  • 61
  • 1
    thanks for your answer. I think my basically need is to find a solution is able to get together teachable methodology for modeling a clean way and access a low level API doing this. If someone is doing small things, low level is ok - but bigger tasks need professional methodology for modeling and documentation. And because of node became important the low level idea should be put together with clean teachable methodology for planing efficient databases. My personnel research is stopped for the moment but I hope this discussion will pull the competitors to work on this. – Danny Dec 10 '15 at 10:03
0

You should take a look at Gundb https://github.com/amark/gun It's open source and has a very active and helpfull lead developer.

Join us at https://gitter.im/amark/gun