I'm working on a simple social network where there are users and their requests to friends in the same MySQL database
I need to organize a quick search for users. I need to find users who have not yet been sent a request to friends.
At the moment I have this structure:
mysql> SELECT * FROM profiles;
+----+---------+-----+---------+------------+
| id | name | age | city_id | country_id |
+----+---------+-----+---------+------------+
| 1 | WILLIAM | 20 | 1 | 1 |
| 2 | JOHN | 24 | 1 | 1 |
| 3 | ROBERT | 21 | 3 | 2 |
| 4 | MICHAEL | 33 | 4 | 2 |
| 5 | JAMES | 27 | 16 | 1 |
| 6 | DAVID | 21 | 13 | 666 |
| 7 | RICHARD | 18 | 4 | 2 |
| 8 | CHARLES | 32 | 88 | 5 |
| 9 | JOSEPH | 29 | 5 | 1 |
| 10 | THOMAS | 19 | 1 | 1 |
+----+---------+-----+---------+------------+
mysql> SELECT * FROM request_for_friendship;
+----+---------+-------+
| id | from_id | to_id |
+----+---------+-------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 8 |
| 5 | 4 | 1 |
| 6 | 9 | 1 |
+----+---------+-------+
When user with id = 1
send request "show me users" the server must return 1 user, which hasn't request in request_for_friendship
and the result should be filtered by city_id
, county_id
and age
My first SQL was with NOT EXIST
(select 1 random row with complex filtering):
SELECT *
FROM
(
SELECT *, ABS(profiles.age - 21) AS nearest_age
FROM profiles
WHERE profiles.id != 1
ORDER BY profiles.city_id <> 1, profiles.country_id <> 1, nearest_age
) AS users
WHERE
NOT EXISTS (
SELECT *
FROM request_for_friendship
WHERE
(
request_for_friendship.from_id = 1
AND
request_for_friendship.to_id = users.id
)
OR
(
request_for_friendship.from_id = users.id
AND
request_for_friendship.to_id = 1
)
)
LIMIT 0 , 1;
Result without LIMIT:
+----+---------+-----+---------+------------+-------------+
| id | name | age | city_id | country_id | nearest_age |
+----+---------+-----+---------+------------+-------------+
| 10 | THOMAS | 19 | 1 | 1 | 2 |
| 5 | JAMES | 27 | 16 | 1 | 6 |
| 6 | DAVID | 21 | 13 | 666 | 0 |
| 7 | RICHARD | 18 | 4 | 2 | 3 |
+----+---------+-----+---------+------------+-------------+
Everything was fine until 10,000 users registered and sent 500,000 requests for friendship.
After that, each user who filtered through NOT EXISTS
spent ~0.05 sec
Therefore, if the user sent 100 requests, then 0.05 * 100 = 5 sec
was spent for filtering 1 user.
It became clear that you can not use NOT EXISTS
for filtering, since it is run each time for each user.
My second SQL was with LEFT JOIN
(mysql: how to save ORDER BY after LEFT JOIN without reorder?):
SELECT * FROM
(
SELECT *, ABS(profiles.age - 21) AS nearest_age
FROM profiles
WHERE profiles.id != 1
ORDER BY profiles.city_id <> 1, profiles.country_id <> 1, nearest_age
) as users
LEFT JOIN request_for_friendship
AS request_for_friendship_copy
ON
(
request_for_friendship_copy.from_id = 1
AND
request_for_friendship_copy.to_id = users.id
)
OR
(
request_for_friendship_copy.from_id = users.id
AND
request_for_friendship_copy.to_id = 1
);
LIMIT 1;
Result without LIMIT:
+----+---------+-----+---------+------------+-------------+------+---------+-------+
| id | name | age | city_id | country_id | nearest_age | id | from_id | to_id |
+----+---------+-----+---------+------------+-------------+------+---------+-------+
| 2 | JOHN | 24 | 1 | 1 | 3 | 1 | 1 | 2 |
| 3 | ROBERT | 21 | 3 | 2 | 0 | 2 | 1 | 3 |
| 8 | CHARLES | 32 | 88 | 5 | 11 | 3 | 1 | 8 |
| 4 | MICHAEL | 33 | 4 | 2 | 12 | 5 | 4 | 1 |
| 9 | JOSEPH | 29 | 5 | 1 | 8 | 6 | 9 | 1 |
| 5 | JAMES | 27 | 16 | 1 | 6 | NULL | NULL | NULL |
| 6 | DAVID | 21 | 13 | 666 | 0 | NULL | NULL | NULL |
| 7 | RICHARD | 18 | 4 | 2 | 3 | NULL | NULL | NULL |
| 10 | THOMAS | 19 | 1 | 1 | 2 | NULL | NULL | NULL |
+----+---------+-----+---------+------------+-------------+------+---------+-------+
This SQL was very fast (~0.02s
), but as you can see ORDER BY
is broken. When I moved ORDER BY to bottom (after JOIN) it took ~3.2s
.
It is better but when users count will be ~1 000 000 it will take a lot of time. I did not find a way to keep the sorting with LEFT JOIN
.
Now I'm thinking about creating a personalized table for each user, where only their requests to friends will be stored
Thus, we can exclude users as in the first version of my SQL with NOT EXISTS But now all users will be filtered by their personal requests to friends
For example, in the first variant, for filtering 1 the user NOT EXISTS
searched his requests for friends among 500,000 other requests.
Now, for the filtering of 1 user, NOT EXISTS
will check only 100 - 1000 requests to friends personally for THIS user.
But this method requires the creation of millions of tables in the database.
How good is this idea? What other good solutions for this task can you offer?
P.S. Sorry for my english