Degrees of Separation Query

Question

I have a table of member-to-member connections. The schema is member_id, friend_id, is_active. I want to build a list of member connections of people who are friends of friends. I'm not really sure how to tackle the query, let alone in a semi-optimized way.

The table above works in a manner where the member_id and friend_id are essentially the same thing on another table. In my system, these id's are generally referred to as member_id except on this one table. For example, let's say my member_id is 21. My number can be on an infinite amount of other rows as either the member_id or the friend_id it's either based on who initiated the actual friendship request originally, that and I didn't want redundant data where I'd have dupe rows to basically do the same thing.

I'd like to have a query where I can not only establish a level of degree (think LinkedIn) but I can also establish how many mutual friends one person may have that's being displayed (think Facebook). The x factor here is the is_active column I mentioned earlier. This column could be 0 or 1. It's a simple tinyint column that acts as a on/off switch. Any friend connections with a 1 would be an active friendship whereas 0 is pending. I need to base this query off my active friends and their active friends and so on. Where none of the active friends my friends have are active friends of mine.

How can I construct a query like this (even if I can't show the level of separation and only get a mutual count)? Right now, I can sort of think of something but it involves query after query some nested in loops, and yea, I just can't picture that being anything good for my servers' overall performance or health over time.

It seems that with most "shortest path" algorithms, a unidirectional path makes things much simpler, so don't worry about duplication too much. — Marcus Adams, Feb 27 '12 at 19:50

Marcus Adams · Accepted Answer · 2012-02-27T21:02:27.050

Here's how to perform the search using a breadth-first, shortest path search, using JOIN. There is no magic in this algorithm, as we're using MySQL to find our answer, and we're not incorporating any fancy search algorithm that uses any kind of heuristics or optimization.

My 'friends' table has unidirectional relationships, so we do have duplicates in the sense that both '1 to 2' and '2 to 1' are stored. I'm also excluding is_active since the implementation will be obvious:

Here's the data:

member_id   friend_id
1           2
1           3
1           4
2           1
2           3
2           5
2           6
3           2
3           1
4           1
5           2
6           2
6           7
7           6
7           8
8           7

We have member 1 selected, and we're asking is 1 friends with 7, a friend of a friend, etc? A count of 0 means no, and a count of 1 means yes.

SELECT COUNT(*)
FROM friends f1
WHERE f1.member_id = 1
  AND f1.friend_id = 7

If no, then are they friend of a friend?

SELECT COUNT(*)
FROM friends f1
JOIN friends f2
  ON f2.member_id = f1.friend_id
WHERE f1.member_id = 1
  AND f2.friend_id = 7

If no, then friend of a friend of a friend?

SELECT COUNT(*)
FROM friends f1
JOIN friends f2
  ON f2.member_id = f1.friend_id
JOIN friends f3
  ON f3.member_id = f2.friend_id
WHERE f1.member_id = 1
  AND f3.friend_id = 7

And so on...

The third query would find the path '1 to 2', '2 to 6', and '6 to 7', returning the count of 1.

Each query becomes more expensive (due to the larger number of joins), so you may want to limit the search at some point. One cool thing is that this search works from both ends toward the middle, which is one simple optimization suggested for shortest path searches.

Here's how to find those mutual friend recommendations for member 1:

SELECT f2.friend_id
FROM friends f1
JOIN friends f2
  ON f2.member_id = f1.friend_id
LEFT JOIN friends f3
  ON f3.member_id = f1.member_id
  AND f3.friend_id = f2.friend_id
WHERE f1.member_id = 1
  AND f2.friend_id <> f1.member_id // Not ourself
  AND f3.friend_id IS NULL // Not already a friend

This comes in handy when combined with COALESCE – Darwayne Aug 06 '15 at 01:24 — Darwayne, Aug 06 '15 at 01:24

score 2 · Answer 2 · answered Feb 27 '12 at 16:38

Without specifics of tables, I can offer the following guidance... If you run your query to ALWAYS put the LOWER ID in the first position, and do distinct (or even count to see how frequent common person is/may be to other parties), you would remove the bloat.

ex:

select
      case when table.MemberID < table.FriendID
         then table.MemberID else table.FriendID end as FirstPerson,
      case when table.MemberID < table.FriendID
         then table.FriendID else table.MemberID end as SecondPerson
   from
     ...
   where...

So, if your data has

member ID   Friend ID
1           2
1           3
1           4
2           1
2           3
2           5
3           2
5           2

and you queried for friends / associations with member ID 1 you would start with
1  2
1  3
1  4

but then friendships from ID #2 would return
1  2  (reversal of 2 / 1 entry) would be duplicate
2  3
2  5

then from friendship 3
2  3  (reversal of 3 / 2 entry) would be duplicate

then from friendship 5 from member 2
2  5  (reversal of 5 / 2 entry) would be dupliate

Not sure this is exactly what you are looking for, but sounds similar to other "social network" finding friends/associations. As for how many "degrees" from a person's association/friendship, you would probably have to nest your queries, or at least keep querying from within some looping structure.

That is partcially helpful, what more type of stuff would you need to know more of that could possibly help me come up with a distinct solution? As for the reference to "social network" yea, in concept there is some of that there its more of a tight nit type of thing, as it is also me just trying to learn for the sake of learning. — chris, Feb 27 '12 at 16:51

Darwayne · Answer 3 · 2015-08-06T01:43:50.843

1

To further improve on the accepted answer, you can utilize coalesce to check each degree of separation until it's found. e.g:

SELECT COALESCE( (SELECT 1 FROM friends f1 WHERE f1.member_id = 1 AND f1.friend_id = 7 LIMIT 1), (SELECT 2 FROM friends f1 JOIN friends f2 ON f2.member_id = f1.friend_id WHERE f1.member_id = 1 AND f2.friend_id = 7 LIMIT 1) /*, ..ETC* ) as degrees_away

edited Aug 06 '15 at 01:43

answered Aug 06 '15 at 01:34

Darwayne

1,400
14
14

Degrees of Separation Query

3 Answers3

Linked