7

I am researching about graph databases. I stumbled into SQL Server 2017 and learned that they added the option to use a graph database. But I have some uncertainties about the performance. I watched several Youtube videos, tutorials and papers about this SQL Server 2017 Graph. For example this page.

Image, lookup in SQL-server

With the image above in mind. When I try to find a node, is it true that the time complexity is O(n)? And is the performance in other graph databases like Neo4j similar? I am only talking about node lookup and not shortest path algorithms etc.

I also have a feeling that the graph functionality in SQL Server is just a relational database in disguise. Is this correct?

Thanks in advance.

Jerry Nixon
  • 31,313
  • 14
  • 117
  • 233
M1sterPl0w
  • 193
  • 3
  • 14
  • "...graph functionality in SQL Server is just a relational database in disguise." - It is, as they say, "Lipstick on a pig". – Marj Feb 16 '19 at 15:14
  • Choosing a graph database for a project depend on many different factors not just the speed. The Microsoft CosmosDB is a powerful Graph Engine. It is scalable to many different geo locations. But it is pricey compared to Azure SQL 2017. If you have already RDBMS data and want to do graph processing, Azure SQL 2017 is the right path. – wonderful world Feb 18 '19 at 01:22

1 Answers1

6

There is a big difference between a graph database and a relational database with graph capabilities, in the sense of how the data is stored.

To summarise simply, when a triple ( aka 2 nodes connected by a relationship ) is stored, the underlying database difference will be :

  • Neo4j, the triple is stored as a graph on disk, nodes have pointers to the relationships they have, so during retrieval it will just be pointer chasing from nodes
  • SQL like : one node is stored in one table, the other node is stored in another table, yet you can query as a graph but the operation will be really making a JOIN

Based on those two facts, we can say that in native graphs the join is performed at write time compared to having joins at query time in non-native graphs.

Be very careful when you hear distributed graphs, partitions, planet scale and the like. If you start having relationships that have to be traversed over the network you will always suffer performance issues. Most of the distributed graphs platforms note also that for maximum performance you have to store everything on one partition (which defeats the partitioning purpose).

Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
  • But what if your 'graph' in SQL is in 1 indexed table? Than you don't do joins, right? – M1sterPl0w Jan 23 '21 at 14:35
  • 1
    Well, I don't think you can represent `(person)-[:WORKS_FOR]->(company)->[:LOCATION]->(city)-[:IN_PROVINCE]->(province)<-[:LOOK_FOR_JOB_IN_PROVINCE]-(applicant)` in one table row. If you have one table, you will do joins across multiple rows ;-) – Christophe Willemsen Jan 23 '21 at 18:15
  • Yeah you are right. I thought all nodes were saved in one table and all edges in one table. But you should store them per type (https://learn.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-architecture?view=sql-server-ver15). Stupid, this article is written in 2018, never found this before. But thanks for your answer! It helped a lot! – M1sterPl0w Jan 25 '21 at 09:31