12

I get a "Range checked for each record (index map: 0x1)" in EXPLAIN SELECT when doing an INNER JOIN on a PRIMARY key with 2 values (using either IN or OR constructs)

Here is the query:

SELECT *
FROM message AS m
INNER JOIN user AS u
ON u.id = m.sender_id OR u.id = m.receiver_id

When doing an explain, it gives me:

+----+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                         |
+----+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------+
|  1 | SIMPLE      | u     | ALL  | PRIMARY       | null | null    | null | 75000 | Range checked for each record (index map: 0x1)|
+----+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------+

It can't be...

If I try this I get the same result:

SELECT *
FROM message AS m
INNER JOIN user AS u
ON u.id IN(m.sender_id, m.receiver_id)

But if I do this, it works fine and I get only 1 row parsed:

SELECT *
FROM message AS m
INNER JOIN user AS u
ON u.id = m.sender_id

How is this possible? I'm joining on a primary key with same-type values. (the actual query is "a bit" more complicated but nothing fancy, 2 inner joins and lastly one left join)

It should be 2 rows, period.

Thanks for any input on this (did some research but did not find anything valuable except for "please add an index", which is obviously not applicable here)

EDIT: yes, I tried the USE INDEX statement, but still no luck

EDIT: Here is a very simple schema to reproduce this weird behavior of MySQL:

CREATE TABLE test_user (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(30),
    PRIMARY KEY (id)
);

CREATE TABLE test_message (
    id INT NOT NULL AUTO_INCREMENT, 
    sender_id INT NOT NULL,
    receiver_id INT NOT NULL,
    PRIMARY KEY (id),
    INDEX idx_sender (sender_id),
    INDEX idx_receiver (receiver_id)
);

EXPLAIN SELECT *
FROM test_message AS m
INNER JOIN test_user AS u
    ON u.id = m.sender_id OR u.id = m.receiver_id;
Lideln Kyoku
  • 952
  • 9
  • 20
  • Is inner join the correct join method? You're not supposed to use FULL as well? – hansmei Nov 03 '14 at 18:09
  • Hmmmm, I see you have only primary indexes set up id numbers. Try adding indexes to sender_id and receiver_id, that may help the pairing. – David162795 Nov 04 '14 at 08:49
  • Okay I did, it does not help with the IN construct, but it works with the OR construct! I thought that indexes were to be set only on the columns that were being searched (user.id), not the (already selected) ones where the values to test come from (task.id_user). Can you please post an actual answer (that I will be allowed to accept) and explain to me (and others) why the index should also be set on the reference column, not only the searched one? Thank you David! – Lideln Kyoku Nov 04 '14 at 11:16
  • Wait, I spoke too fast. It still does not work. It only fixes the one and single use case where I do: u.id = t.id_user OR u.id = t.id_user (which is pointless but was still causing the issue and made the query even simpler). I'm sorry that it does not work. Any other idea? I'm sure I'm not the only one trying to compare a column to 2 columns using OR. I'm surprised I did not come accross that earlier. – Lideln Kyoku Nov 04 '14 at 12:00
  • @hansmei, MySQL does not support `FULL OUTER JOIN`. And there is no such thing in SQL as `FULL INNER JOIN`. – Bill Karwin Nov 02 '17 at 21:16
  • Please don't clarify in comments; edit your question. Also please do not keep adding "EDIT"s. Just edit your question to be the best self-contained up-to-date version. Past versions are accessible via the "edited" link. – philipxy Nov 04 '17 at 03:37
  • 3
    Hi. What do you mean, "it works fine"? Or "it does not work". Also you don't actually say what you expect. Even though you might think that's obvious. Please read & act on [mcve]. Here, you have no example input, output & desired output. And what does "I get only 1 row parsed" mean? And give a complete example (code & data) that exhibits your problem. And in giving your specification, what are any other *declarable* unique & FKs column sets & non-null columns? PS What do mean by the "reference" & "searched" columns? – philipxy Nov 04 '17 at 03:44
  • @LidelnKyoku - Something is wrong with the first query and its `EXPLAIN` -- The query mentions two tables; the `EXPLAIN` mentions only one. Please fix. – Rick James Nov 06 '17 at 15:58

2 Answers2

9

In general, MySQL can use only one index per table reference in a query (there's an index-merge algorithm, but this doesn't work as often as you might think).

Your join condition has an OR between two comparisons to indexed columns, and the optimizer can't choose which is the better one to use before the data in the table is examined row-by-row.

A common workaround is to do a UNION between simpler queries, instead of the OR condition.

mysql> EXPLAIN 
    SELECT * FROM test_message AS m 
    INNER JOIN test_user AS u ON u.id = m.sender_id 
  UNION
    SELECT * FROM test_message AS m 
    INNER JOIN test_user AS u ON u.id = m.receiver_id;

+----+--------------+------------+--------+---------------+---------+---------+--------------------+------+-----------------+
| id | select_type  | table      | type   | possible_keys | key     | key_len | ref                | rows | Extra           |
+----+--------------+------------+--------+---------------+---------+---------+--------------------+------+-----------------+
|  1 | PRIMARY      | m          | ALL    | idx_sender    | NULL    | NULL    | NULL               |    1 | NULL            |
|  1 | PRIMARY      | u          | eq_ref | PRIMARY       | PRIMARY | 4       | test.m.sender_id   |    1 | NULL            |
|  2 | UNION        | m          | ALL    | idx_receiver  | NULL    | NULL    | NULL               |    1 | NULL            |
|  2 | UNION        | u          | eq_ref | PRIMARY       | PRIMARY | 4       | test.m.receiver_id |    1 | NULL            |
| NULL | UNION RESULT | <union1,2> | ALL    | NULL          | NULL    | NULL    | NULL               | NULL | Using temporary |
+----+--------------+------------+--------+---------------+---------+---------+--------------------+------+-----------------+

This does use proper index lookups in both sub-queries, but it has to use a temporary table to finish the UNION afterwards. Ultimately, it might be a wash for performance. Depends on how many rows of data need to be examined, and how many rows are produced as the result.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I cannot check but believe that using "UNION ALL" will help to avoid temporary table(potentially it may cause duplicates but I suppose it's quite rare case to send message to yourselves) – skyboyer Nov 07 '17 at 21:10
  • 1
    @skyboyer: MySQL 5.7.3 and later can eliminate the temporary table when you use `UNION ALL`. See specific notes at https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-3.html. Earlier versions of MySQL don't have this optimization. – Bill Karwin Nov 07 '17 at 21:20
1

The problem is well known also in other (I think all) RDBMS, the optimizer will use only one rule for each join.

If the join condition is complex or if it can't recognize a known pattern to solve it, no optimization will be applied and it will go for full table scan.

In your case the OR condition in main join, seems simple but it is not, because you are asking to check each user id against two different columns (not constant values) at a time.

To solve it you must split your join condition in more sub-queries so optimizer can use the better rule for each one.

@Bill Karwin has proposed the common solution and it helps well to understand the problem.

A (slightly) better way to solve this problem is to move the union up one level and join on the derived table:

EXPLAIN 
SELECT *
FROM test_user AS u
INNER JOIN (
    select id, sender_id as msg_id
    from test_message 
    union all 
    select id, receiver_id 
    from test_message 
    ) AS m 
ON u.id = m.msg_id;

It will not use TEMPORARY tables and does only one full table scan on test_users instead of two

id  select_type table           partitions  type    possible_keys   key             key_len ref         rows    filtered    Extra
1   PRIMARY     u               NULL        ALL     PRIMARY         NULL            NULL    NULL        1       100.00      NULL
1   PRIMARY     <derived2>      NULL        ref     <auto_key0>     <auto_key0>     4       test.u.id   2       100.00      NULL
2   DERIVED     test_message    NULL        index   NULL            idx_sender      4       NULL        1       100.00      "Using index"
3   UNION       test_message    NULL        index   NULL            idx_receiver    4       NULL        1       100.00      "Using index" 
MtwStark
  • 3,866
  • 1
  • 18
  • 32
  • Try that `EXPLAIN` again, after you have added a few dozen rows to the tables. You might find that it does _not_ start with a full table scan of `u`. But, there is still a tmp table implied in ``. Furthermore there is a full table scan of it (ok, only 2 rows) to build an index (``). – Rick James Nov 04 '17 at 02:28
  • @RickJames I did my tests with 6 users and 50 messages and the execution plan does not change. This query is similar to @BillKarwin one, but it does only 1 full table scan of u and only 1 nested loop (`JOIN`) instead of 2 and 2. From MySQL 5.7 building the merged index `` for derived table should have good performances. I mean, to join a union shoud be better than the union of two joins. – MtwStark Nov 06 '17 at 15:30
  • Whoa! The Question wanted all columns from both `test_message` and `test_user`; the query here delivers only the columns from `test_user`. So, I suggest it is not a valid Answer. – Rick James Nov 06 '17 at 15:53
  • the question was "How is this possible to get 'Range checked for each record' in this scenario?" and my answer give a description of the problem and an alternative solution to avoid it. If you do not like it, don't worry.. I will survive.. – MtwStark Nov 06 '17 at 17:16
  • 'Range checked for each record' is so cryptic that I don't know what it means. – Rick James Nov 06 '17 at 17:22
  • ?!?!? I thought you were the MySQL Expert.. and also that you read the question (at least the title) – MtwStark Nov 06 '17 at 17:36