Perfomance SQL Server 2017 Graph vs Neo4j

Question

I am researching about graph databases. I stumbled into SQL Server 2017 and learned that they added the option to use a graph database. But I have some uncertainties about the performance. I watched several Youtube videos, tutorials and papers about this SQL Server 2017 Graph. For example this page.

With the image above in mind. When I try to find a node, is it true that the time complexity is O(n)? And is the performance in other graph databases like Neo4j similar? I am only talking about node lookup and not shortest path algorithms etc.

I also have a feeling that the graph functionality in SQL Server is just a relational database in disguise. Is this correct?

Thanks in advance.

"...graph functionality in SQL Server is just a relational database in disguise." - It is, as they say, "Lipstick on a pig". — Marj, Feb 16 '19 at 15:14
Choosing a graph database for a project depend on many different factors not just the speed. The Microsoft CosmosDB is a powerful Graph Engine. It is scalable to many different geo locations. But it is pricey compared to Azure SQL 2017. If you have already RDBMS data and want to do graph processing, Azure SQL 2017 is the right path. — wonderful world, Feb 18 '19 at 01:22

score 6 · Accepted Answer · answered Jan 03 '21 at 12:13

There is a big difference between a graph database and a relational database with graph capabilities, in the sense of how the data is stored.

To summarise simply, when a triple ( aka 2 nodes connected by a relationship ) is stored, the underlying database difference will be :

Neo4j, the triple is stored as a graph on disk, nodes have pointers to the relationships they have, so during retrieval it will just be pointer chasing from nodes
SQL like : one node is stored in one table, the other node is stored in another table, yet you can query as a graph but the operation will be really making a JOIN

Based on those two facts, we can say that in native graphs the join is performed at write time compared to having joins at query time in non-native graphs.

Be very careful when you hear distributed graphs, partitions, planet scale and the like. If you start having relationships that have to be traversed over the network you will always suffer performance issues. Most of the distributed graphs platforms note also that for maximum performance you have to store everything on one partition (which defeats the partitioning purpose).

But what if your 'graph' in SQL is in 1 indexed table? Than you don't do joins, right? — M1sterPl0w, Jan 23 '21 at 14:35
Well, I don't think you can represent `(person)-[:WORKS_FOR]->(company)->[:LOCATION]->(city)-[:IN_PROVINCE]->(province)<-[:LOOK_FOR_JOB_IN_PROVINCE]-(applicant)` in one table row. If you have one table, you will do joins across multiple rows ;-) — Christophe Willemsen, Jan 23 '21 at 18:15
Yeah you are right. I thought all nodes were saved in one table and all edges in one table. But you should store them per type (https://learn.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-architecture?view=sql-server-ver15). Stupid, this article is written in 2018, never found this before. But thanks for your answer! It helped a lot! — M1sterPl0w, Jan 25 '21 at 09:31

Perfomance SQL Server 2017 Graph vs Neo4j

1 Answers1