2

I have a model called CacheSync, mysql shows that it has an index:

mysql> show indexes from cache_syncs;
+-------------+------------+---------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name                        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+---------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cache_syncs |          0 | PRIMARY                         |            1 | id          | A         |       90878 |     NULL | NULL   |      | BTREE      |         |               |
| cache_syncs |          1 | index_cache_syncs_on_created_at |            1 | created_at  | A         |       18175 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+---------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.01 sec)

But when I go to explain, it says it's not using an index:

CacheSync.where("created_at < ?", (Time.now - 1.hour).to_time).explain
 =>
EXPLAIN for: SELECT `cache_syncs`.* FROM `cache_syncs` WHERE (created_at < '2022-06-13 19:37:23.316439')
+----+-------------+-------------+------+---------------------------------+------+---------+------+-------+-------------+
| id | select_type | table       | type | possible_keys                   | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------------+------+---------------------------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | cache_syncs | ALL  | index_cache_syncs_on_created_at | NULL | NULL    | NULL | 93651 | Using where |
+----+-------------+-------------+------+---------------------------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)

Why isn't it using the index?

Thanks for any help, kevin

user1130176
  • 1,772
  • 1
  • 23
  • 33
  • What proportion of the rows in that table match the condition? If the optimizer estimates that your query is matching a large enough portion of the table, it guesses that it might as well just do a table-scan instead of wasting time with the index. – Bill Karwin Jun 13 '22 at 20:47
  • hmm great question, definitely more than half. – user1130176 Jun 13 '22 at 20:48
  • you could test @BillKarwin 's theory by changing the query to, say, `where("created_at > ?", 1.minute.ago)`, just to reduce the number of matches – Les Nightingill Jun 14 '22 at 22:07

1 Answers1

3

In my experience, if the optimizer estimates that your condition matches over 20% of the table, it'll fall back to a table-scan. It guesses that it's quicker to read all the rows from the clustered index than to look up values in the secondary index, then do another lookup to get the corresponding rows from the table.

The 20% threshold is not any official feature, it's just what I've observed. It is not configurable in current versions of MySQL.

You can use an index hint to convince it that a table-scan is prohibitively expensive:

SELECT ... FROM mytable FORCE INDEX (index_cache_syncs_on_created_at) WHERE ...

Then it will do a table-scan only if the index you name is irrelevant to the conditions in the query.

See https://dev.mysql.com/doc/refman/8.0/en/index-hints.html for more info on index hints.

I'm not a Rails developer, but this old answer shows one method of conveying index hint syntax to Rails: https://stackoverflow.com/a/13904227/20860 I don't know if that's still currently the way to do it.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828