3

I would like to know if worth the idea of use graph databases to work specifically with relationships.

I pretend to use relational database for storing entities like "User", "Page", "Comment", "Post" etc.

But in most cases of a typical social graph based workload, I have to get a deep traversals that relational are not good to deal and involves slow joins.

Example: Comment -(made_in)-> Post -(made_in)-> Page etc...

I'm thinking make something like this:

Example:

User id: 1

Query: Get all followers of user_id 1

  • Query Neo4j for all outcoming edges named "follows" for node user with id 1
  • With a list of ids query them on the Users table:

    SELECT * FROM users WHERE user_id IN (ids)

Is this slow?

I have seen this question Is it a good idea to use MySQL and Neo4j together?, but still cannot understand why the correct answer says that that is not a good idea.

Thanks

Community
  • 1
  • 1
Luccas
  • 4,078
  • 6
  • 42
  • 72
  • "Good idea" is extremely subjective. If your primary goal is performance then I'd say it's a bad idea. It would be significantly more efficient to just store all your user data in neo4j. Do you have a specific use case for having a polyglot system? – ean5533 Apr 05 '13 at 20:54

3 Answers3

2

Using Neo4j is a great choice of technologies for an application like yours, that requires deep traversals. The reason it's a good choice is two-fold: one is that the Cypher language makes such queries very easy. The second is that deep traversals happen very quickly, because of the way the data is structured in the database.

In order to reap both of these benefits, you will want to have both the relationships and the people (as nodes) in the graph. Then you'll be able to do a friend-of-friends query as follows:

START john=node:node_auto_index(name = 'John') MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof

and a friend-of-friend-of-friend query as follows:

START john=node:node_auto_index(name = 'John') MATCH john-[:friend]->()-[:friend]->()->[:friend]->fofof RETURN john, fofof

...and so on. (Same idea for posts and comments, just replace the name.)

Using Neo4j alongside MySQL is fine, but I wouldn't do it in this particular way, because the code will be much more complex, and you'll lose too much time hopping between Neo4j and MySQL.

Best of luck!

Philip

Philip Rathle
  • 1,555
  • 12
  • 9
1

In general, the more databases/systems/layers you've got, the more complex the overall setup and operating will be.

Think about all those tasks like synchronization, export/import, backup/archive etc. which become quite expensive if your database(s) grow in size.

People use polyglot persistence only if the benefits of having dedicated and specialized databases outweigh the drawbacks of having to cope with multiple data stores. F.e. this can be the case if you have a large number of data items (activity or transaction logs f.e.), each related to a user. It would probably make no sense to store all the information in a graph database if you're only interested in the connections between the data items. So you would be better off storing only the relations in the graph (and the nodes have just a pointer into the other database), and the data per item in a K/V store or the like.

For your example use case, I would go only for one database, namely Neo4j, because it's a graph.

Axel Morgner
  • 2,292
  • 16
  • 15
1

As the other answers indicate, using Neo4j as your single data store is preferable. However, in some cases, there might not be much choice in the matter where you already have another database behind your product. I would just like to add that if this is the case, running neo4j as your secondary database does work (the product I work on operates in this mode). You do have to work extra hard at figuring out what functionality you expect out of neo4j, what kind of data you need for it,how to keep the data in sync and the consequence of suffering from not always real time results. Most of our use cases can work with near real time results so we are fine. Bit it may not be the case for your product. Still, to me , using neo4j in this mode is still preferable than running without it. We are able to produce a lot of graphy-great stuff as a result of it.

Luanne
  • 19,145
  • 1
  • 39
  • 51