0

I know this question maybe asked a lot but I don't find the same case as mine so I wanted to ask to understand more details.

let's assume that I have a table called Users with a nullable integer column called age. I have made an index on that column called age_index.

I found that there exist 7 million users for example in the DB only 700 of them has age populated. And also the cardinality of the index is only 28. [which is not good I know, but let's assumed this is the case]

Whenever, I run a query like that:

SELECT *
FROM users
WHERE age IN (24,
              1)
  AND deleted_at IS NULL;

It uses the index and fetches the user details very fast. But, whenever the list of ages size in the query increases to 200 distinct ages or more. Mysql decides not to use the index and to search in 7 million records.

Second query:

SELECT * FROM users 
       WHERE age IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,
         22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,
        44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,
        67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,
       90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,
     110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,
     128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,
     146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,
     164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,
     182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,
      197,198,199,200) 
    AND deleted_at is null;

However, when I explain the previous query, I can find the key is in the possible keys but not used.

Any idea why?

** EDIT

Explain of the first query:

+----+-------------+--------+------------+-------+------------------------+------------------------+---------+------+------+----------+------------------------------------+
| id | select_type | table  | partitions | type  | possible_keys          | key                    | key_len | ref  | rows | filtered | Extra                              |
+----+-------------+--------+------------+-------+------------------------+------------------------+---------+------+------+----------+------------------------------------+
|  1 | SIMPLE      | users  | NULL       | range | age_index              | age_index              | 5       | NULL |  199 |    10.00 | Using index condition; Using where |
+----+-------------+--------+------------+-------+------------------------+------------------------+---------+------+------+----------+------------------------------------+

Explain of the second query:

+----+-------------+--------+------------+------+------------------------+------+---------+------+---------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys          | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------+------------+------+------------------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | users  | NULL       | ALL  | age_index              | NULL | NULL    | NULL | 7000000 |    10.00 | Using where |
+----+-------------+--------+------------+------+------------------------+------+---------+------+---------+----------+-------------+
Ankit Jindal
  • 3,672
  • 3
  • 25
  • 37
  • 1
    share `explain` of both the queries – aRvi Sep 13 '20 at 10:09
  • @aRvi updated the question with the explain for both queries – Haytham.Breaka Sep 13 '20 at 10:22
  • could you kindly share the second query please? –  Sep 13 '20 at 10:29
  • For a set of 7 million users you only have 700 having `age` populated. Does this mean that the age field is `NULL` on other users? Additionally have you tried to use `BETWEEN` since all ages are incremental? –  Sep 13 '20 at 11:08
  • @PeterDarmis yes the rest of the 7 million users has null age. No I didn't try `BETWEEN`. In this example, the numbers are incremental yes but I just wanted an example with distinct numbers – Haytham.Breaka Sep 13 '20 at 11:14
  • Try read this article https://dev.mysql.com/doc/refman/5.6/en/range-optimization.html and this one https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#explain_key_len then try to understand why your `key_len` in the first query is 5. –  Sep 13 '20 at 11:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/221410/discussion-between-peter-darmis-and-haytham-breaka). –  Sep 13 '20 at 11:55
  • what all columns included in `age_index`? – aRvi Sep 13 '20 at 12:14
  • Why aren't you using `age < 200` OR `age BETWEEN 1 AND 200`? – Ankit Jindal Sep 13 '20 at 12:28
  • @aRvi `age_index` only contains the `age` column – Haytham.Breaka Sep 13 '20 at 14:33
  • @AnkitJindal I can't use range. Actually, the real use case contains random values for eager loading. – Haytham.Breaka Sep 13 '20 at 14:34
  • @Haytham.Breaka you can try doing `force index(age_index)` – aRvi Sep 13 '20 at 14:34
  • @aRvi something interesting happened now when I forced the index, the result of explain contained 58 million rows while there're only 7 million rows exist in the db. – Haytham.Breaka Sep 13 '20 at 14:40
  • @Haytham.Breaka can you share the query and EXPLAIN result – aRvi Sep 13 '20 at 14:45
  • @aRvi query: `explain select * from users force index(age_index) where age IN (1,2,3,4,5,6,7,8,...,200) and deleted_at is null;` – Haytham.Breaka Sep 13 '20 at 14:50
  • strange, this is showing possible key `age_index` and the key is `groups_parent_id_index`. – aRvi Sep 13 '20 at 14:53
  • @aRvi the correct explain result `id: 1 select_type: SIMPLE table: users partitions: NULL type: range possible_keys: age_index key: age_index key_len: 5 ref: NULL rows: 58449395 filtered: 10.00 Extra: Using index condition; Using where 1 row in set, 1 warning (0.00 sec)` – Haytham.Breaka Sep 13 '20 at 14:55

3 Answers3

1

MySql uses cost based optimizers. Sometimes accessing the data by index values is much more expensive query execution plan than normal table scan hence based on the cost of the query execution MySql maynot prefer to use index for some query.

You can check out this similar post MySQL not using indexes with WHERE IN clause? and read https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html to see how mysql index works.

0

You sometimes have to force the use of an index by using a join instead of in. For instance, if you have a table ages with all the possible ages, do

select u.*
from ages
join users u using (age)
where ages.age in (...)

You may also need straight_join

ysth
  • 96,171
  • 6
  • 121
  • 214
0

Please try to change your query and try this:

SELECT * FROM `users` WHERE `age` IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200) AND `age` IS NOT NULL AND `deleted_at` IS NULL;