0

There has a simple requirement that is query the amount of the Six Degrees relationship from a Friend table.

The structure of the Friend is like this:

+----------+---------+------+-----+---------+----------------+
| Field    | Type    | Null | Key | Default | Extra          |
+----------+---------+------+-----+---------+----------------+
| id       | int(11) | NO   | PRI | NULL    | auto_increment |
| userId   | int(11) | NO   | MUL | NULL    |                |
| friendId | int(11) | NO   |     | NULL    |                |
+----------+---------+------+-----+---------+----------------+

Assume I want to know the Six Degrees relationship amount of userId:1, And I wrote down six queries like this
SELECT friendId FROM Friend WHERE userId = 1 to get the one degree friends.

Then execute
SELECT friendId FROM Friend WHERE userId in (/*above query result*/)
five times.

The problem is not as simple as it looks like, cause I have millions records in Friend table.

There is a strong possibility is the Six Degrees relationship amount of user 1 is greater than six digits, although he/she only have two friends in One Degree relationship.

The number of items in the IN clause is exponentially.

Then the six queries taking more than one minute to get result.

How to optimize this situation?

Joe
  • 3,581
  • 4
  • 23
  • 28
  • create a temporary table with an INDEX on the JOINED column and then use JOIN instead of IN – Borjante Nov 25 '15 at 14:15
  • This might be of interest: [Challenge,how to implement an algorithm for six degree of separation?](http://stackoverflow.com/questions/2076715/challenge-how-to-implement-an-algorithm-for-six-degree-of-separation) – jpw Nov 25 '15 at 14:18

2 Answers2

0

You can use subqueries and see if MySQL optimizer is clever enough to rewrite them as joins (it usually does).

But actually RDBMS is unsuitable for the task. Better look into graph-based databases. See this question for example.

Community
  • 1
  • 1
Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156
0

Create a temp table to hold the intermediate results, and JOIN instead of IN:

DROP TEMPORARY TABLE IF EXISTS tmp_friends;
CREATE TEMPORARY TABLE `tmp_friends` (
    `id` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`)
);

INSERT INTO tmp_friends VALUES(<id of the given user>);

#run this 6 times
INSERT IGNORE INTO tmp_friends
SELECT f.userId
FROM tmp_friends t
JOIN Friend f ON f.friendId = t.id

SELECT f.*
FROM tmp_friends t
JOIN Friend f ON f.userId = t.id
Vatev
  • 7,493
  • 1
  • 32
  • 39