Efficient way to implement LinkedIn like "How you are connected to" feature?

Question

LinkedIn has this cool feature in which while visiting some user's profile, LinkedIn prompts how you are connecting to that user through the network.

Assuming that the visitor and the profile owner are two nodes of a graph where the nodes represent users and edge represents friendship, a simple solution could be a bfs starting from both the nodes up to certain level and see if there are any intersections. The intersections would be the network link-nodes.

Although this sounds neat, the problem is that in order to determine friends of each person, a separate DB query is needed. When the network goes deeper than 2 levels, it would be highly time consuming algorithm. Is there a better efficient alternative? If not, how can we add better hardware support (parallel computing, grids, distributed database etc) in order to bring down the time required for computation?

I had to remove the image from your post because ImageShack has deleted it and replaced it with advertising. See http://meta.stackexchange.com/q/263771/215468 for more information. If possible, it would be great for you to re-upload them. Thanks! — Undo, Sep 28 '15 at 03:04

score 5 · Accepted Answer · edited May 23 '17 at 12:25

You can see how this can be done in the article Graphs in the database: SQL meets social networks by Lorenzo Alberton. The example code is written for PostgreSQL using CTEs. However, I doubt that using a RDBMS for this will perform well. I wrote up an article on how to do the same stuff as in the mentioned article using a native graph database, in this case Neo4j: Social networks in the database: using a graph database. Other than the differences in performance, a graph database also simplifies the task by providing a graph API that makes it easy to handle traversals that would be extremely complex to write in SQL (or by using stored procedures). I wrote a bit more on graph databases in this thread and see this one too.

I read the https://inviqa.com/blog/storing-graphs-database-sql-meets-social-network doc, Although I am still confused about how we will get connections in LinkedIn which are 3degree+, to get an exact number there don't we have to traverse the whole graph till we reach an end. Or will that number be a guesstimate based on 2/3 degree connections? — Nikhil Verma, Aug 20 '21 at 07:32

score 1 · Answer 2 · answered Oct 13 '09 at 05:18

1

Without some kind of recursive stored procedure (CTE in SQL Server 2005+), you'll need multiple round trips as the levels get deeper. However, a good cache infrastructure could really help performance as the most popular / active users' connection lists would remain cached. A read/write through cache mechanism would make things even better (cache updates cascade to db updates, cache reads cascade to db reads)

answered Oct 13 '09 at 05:18

Chris

27,596
25
124
225

this is a good comment because a lot of people don't want to just rely on SQL Server CTEs, Procs, or other T-SQL to always do the grunt work. Store it in SQL Server and then as you stated Cache once in for example your C# app and use it in memory to look stuff up if it's only for a small set of data. – PositiveGuy Mar 24 '13 at 21:06

Efficient way to implement LinkedIn like "How you are connected to" feature?

2 Answers2

Linked