2


I've read several articles (like this) stating that graph DBs are inherently faster then RDBs when running graph traversal algorithms because of the index-free adjacency. However, I'm having trouble understanding the theoretical justification for it. It seems to me that if you construct a hash-indexed adjacency tables, you should reach the same complexity performance.

For example, finding the friends of a person (given the person id) using an RDB with 2 tables: people and friendships

1) Locating the friends: O(m) - where m is the number of friends.
2) For each friend Id, locating in people: O(1)
Total: O(m)

In a graph DB, this should be the same, no?

Marumba
  • 359
  • 3
  • 12

1 Answers1

2

No, queries are executed differently in RDBMS than in a graph database.

  1. The example you gave is finding a friend of a given person, which is a one-hop query (in graph terms) and is quite easy in both kinds of databases.

    However, if you want to perform an n-hop query (n > 3), in RDBMS you can use subquery or join and the performance would be dependent on your optimizer.

    Below is an example:

    Assume that we have tables class with fields id (PRIMARY KEY) and name, student with fields id (PRIMARY KEY), name and class_id.

In order to find the class name whose id is 2, and the corresponding students, we need to join between two tables table class and student

SELECT c.name as c_name, s.name as s_name 
  FROM class as c 
    LEFT JOIN student as s 
      ON c.id = s.class_id 
        WHERE c.id = 2;

Explain the query: Query explained in table

The whole student table will be scanned in order to find class_id=2.

Of course we can create an index on student class_id column.

table of the index

It reads the student_class index to get the pointers to the physical rows and then read the records as it is a non clustered index.

In graph database, data are modeled as nodes and connections.

graph database model

To find the class name whose id is 2, and the corresponding students, just get the class node and traverse in backwards direction on select connections. And avoid the join index lookup performance problem.

  1. If you want to find the shortest path and all possible paths between two points (but you don't know how many hops there are for the query), then there would be much trouble using RDBMS. The query would be quite long. LDBC has some nice cases using SQL, GQL (Cypher) and SparQL respectively. Unfortunately I haven't found the runtime differences among the different languages.

  2. It is difficult to do graph computing like LPA (Label Propagation Algorithm) and Page Rank algorithm with RDBMS. But would be much easier to do so in some(most) Graph DBMS

venegr
  • 148
  • 8
  • 1. But why would an RDB with subqueries explode. Even doing no optimization will perform the inner query first returning m1 results, which will then be used by the previous subquery etc. I don't see how this will result in the state space exploding to D^n as you describe 2. Thanks for the link. I was quite surprised to see that Cypher queries are not more succinct then SQL.I thought that since a graph is often a more natural way of looking at data (at least to me), it will result in shorter queries. – Marumba Feb 23 '20 at 07:20
  • 1
    @Marumba I have updated the first point in my answer by adding a query example. – venegr Feb 26 '20 at 06:14
  • 1
    @Marumba You may also refer to [this post](https://stackoverflow.com/questions/13046442/comparison-of-relational-databases-and-graph-databases) for the differences between RDBMS and graph databases. – venegr Feb 26 '20 at 06:23
  • Using your example, if both s.class_id and c.id had hash indexes defined on them. Wouldn't the join index lookup performance penalty disappear (I'm talking strictly on theoretical complexity)? – Marumba Feb 26 '20 at 16:18
  • It needs enough buckets to hold the data for hash indexes. For a more complex scenario and huge databases, graph DB benefits. For example, Recommendation the goods that my friends have bought and returned the corresponding sellers' info. According to https://www.oberlo.com/blog/amazon-statistics , there were 150.6 million mobile users, about 120 million products and 2.5 million sellers on Amazon 2019. if we do not have enough memory to hold the buckets, hash indexes would be slower. – venegr Mar 10 '20 at 09:26