0

I'm working on a simple social network where there are users and their requests to friends in the same MySQL database

I need to organize a quick search for users. I need to find users who have not yet been sent a request to friends.

At the moment I have this structure:

mysql> SELECT * FROM profiles;
+----+---------+-----+---------+------------+
| id | name    | age | city_id | country_id |
+----+---------+-----+---------+------------+
|  1 | WILLIAM |  20 |       1 |          1 |
|  2 | JOHN    |  24 |       1 |          1 |
|  3 | ROBERT  |  21 |       3 |          2 |
|  4 | MICHAEL |  33 |       4 |          2 |
|  5 | JAMES   |  27 |      16 |          1 |
|  6 | DAVID   |  21 |      13 |        666 |
|  7 | RICHARD |  18 |       4 |          2 |
|  8 | CHARLES |  32 |      88 |          5 |
|  9 | JOSEPH  |  29 |       5 |          1 |
| 10 | THOMAS  |  19 |       1 |          1 |
+----+---------+-----+---------+------------+

mysql> SELECT * FROM request_for_friendship;
+----+---------+-------+
| id | from_id | to_id |
+----+---------+-------+
|  1 |       1 |     2 |
|  2 |       1 |     3 |
|  3 |       1 |     8 |
|  5 |       4 |     1 |
|  6 |       9 |     1 |
+----+---------+-------+

When user with id = 1 send request "show me users" the server must return 1 user, which hasn't request in request_for_friendship and the result should be filtered by city_id, county_id and age

My first SQL was with NOT EXIST (select 1 random row with complex filtering):

SELECT *
FROM
    (
        SELECT *, ABS(profiles.age - 21) AS nearest_age
        FROM profiles
        WHERE profiles.id != 1
        ORDER BY profiles.city_id <> 1, profiles.country_id <> 1, nearest_age
    ) AS users
WHERE
    NOT EXISTS (
        SELECT *
        FROM request_for_friendship
        WHERE
            (
                request_for_friendship.from_id = 1
                AND
                request_for_friendship.to_id = users.id
            )
            OR
            (
                request_for_friendship.from_id = users.id
                AND
                request_for_friendship.to_id = 1
            )
    )
LIMIT 0 , 1;

Result without LIMIT:

+----+---------+-----+---------+------------+-------------+
| id | name    | age | city_id | country_id | nearest_age |
+----+---------+-----+---------+------------+-------------+
| 10 | THOMAS  |  19 |       1 |          1 |           2 |
|  5 | JAMES   |  27 |      16 |          1 |           6 |
|  6 | DAVID   |  21 |      13 |        666 |           0 |
|  7 | RICHARD |  18 |       4 |          2 |           3 |
+----+---------+-----+---------+------------+-------------+

Everything was fine until 10,000 users registered and sent 500,000 requests for friendship. After that, each user who filtered through NOT EXISTS spent ~0.05 sec Therefore, if the user sent 100 requests, then 0.05 * 100 = 5 sec was spent for filtering 1 user.

It became clear that you can not use NOT EXISTS for filtering, since it is run each time for each user.

My second SQL was with LEFT JOIN (mysql: how to save ORDER BY after LEFT JOIN without reorder?):

SELECT * FROM
(
    SELECT *, ABS(profiles.age - 21) AS nearest_age
    FROM profiles
    WHERE profiles.id != 1
    ORDER BY profiles.city_id <> 1, profiles.country_id <> 1, nearest_age
) as users
    LEFT JOIN request_for_friendship
    AS request_for_friendship_copy
    ON
    (
        request_for_friendship_copy.from_id = 1
        AND
        request_for_friendship_copy.to_id = users.id
    )
    OR
    (
        request_for_friendship_copy.from_id = users.id
        AND
        request_for_friendship_copy.to_id = 1
    );
LIMIT 1;

Result without LIMIT:

+----+---------+-----+---------+------------+-------------+------+---------+-------+
| id | name    | age | city_id | country_id | nearest_age | id   | from_id | to_id |
+----+---------+-----+---------+------------+-------------+------+---------+-------+
|  2 | JOHN    |  24 |       1 |          1 |           3 |    1 |       1 |     2 |
|  3 | ROBERT  |  21 |       3 |          2 |           0 |    2 |       1 |     3 |
|  8 | CHARLES |  32 |      88 |          5 |          11 |    3 |       1 |     8 |
|  4 | MICHAEL |  33 |       4 |          2 |          12 |    5 |       4 |     1 |
|  9 | JOSEPH  |  29 |       5 |          1 |           8 |    6 |       9 |     1 |
|  5 | JAMES   |  27 |      16 |          1 |           6 | NULL |    NULL |  NULL |
|  6 | DAVID   |  21 |      13 |        666 |           0 | NULL |    NULL |  NULL |
|  7 | RICHARD |  18 |       4 |          2 |           3 | NULL |    NULL |  NULL |
| 10 | THOMAS  |  19 |       1 |          1 |           2 | NULL |    NULL |  NULL |
+----+---------+-----+---------+------------+-------------+------+---------+-------+

This SQL was very fast (~0.02s), but as you can see ORDER BY is broken. When I moved ORDER BY to bottom (after JOIN) it took ~3.2s. It is better but when users count will be ~1 000 000 it will take a lot of time. I did not find a way to keep the sorting with LEFT JOIN.

Now I'm thinking about creating a personalized table for each user, where only their requests to friends will be stored

Thus, we can exclude users as in the first version of my SQL with NOT EXISTS But now all users will be filtered by their personal requests to friends

For example, in the first variant, for filtering 1 the user NOT EXISTS searched his requests for friends among 500,000 other requests. Now, for the filtering of 1 user, NOT EXISTS will check only 100 - 1000 requests to friends personally for THIS user. But this method requires the creation of millions of tables in the database.

How good is this idea? What other good solutions for this task can you offer?

P.S. Sorry for my english

Community
  • 1
  • 1
mixalbl4
  • 3,507
  • 1
  • 30
  • 44

1 Answers1

0

Do you really think that making tens of thousands of tables is a good idea?

The NOT EXISTS is just fine, you are probably only missing index. You need two indexes, on (from_id, to_id) and on (to_id, from_id). You need both of them. You can also try to rewrite NOT EXISTS (A OR B) to NOT EXISTS A AND NOT EXISTS B, but it would probably be the same.

Antonín Lejsek
  • 6,003
  • 2
  • 16
  • 18