1

I am using using MySQL (InnoDB engine). I am running SQL query with WHERE IN statement which is also ordered.

I decided to create the perfect index, so my query will not run slow. Surprisingly, after creating the perfect index MySQL is still using filesort when running my query.

My Query:

SELECT *
    FROM `events`
    WHERE
        `events`.`user_id` IN (118, 105, 126, 196, 338)
    ORDER BY
        `date` ASC,
        `planing_hour_id` ASC;

My perfect Index:

ALTER TABLE `events`
    ADD INDEX `my_perfect_index` (
        `user_id`,
        `date` ASC,
        `planing_hour_id` ASC
    );

Output of EXPLAIN: EXPLAIN SQL Query Screenshot

Problem:

MySQL still uses filesort even when there is perfect index available. I would expect only Using Index condition to be present in output of EXPLAIN query. Why is this not the case?

Disclaimer:

I checked similar topics and none of them has been useful:

Fusion
  • 5,046
  • 5
  • 42
  • 51
  • https://stackoverflow.com/questions/2829544/mysql-how-to-index-an-or-clause – jarlh Apr 13 '23 at 09:32
  • @jarlh Mentioned topic discusses how composite indexes deal with OR clauses. Can you elaborate how it is relevant for this question? – Fusion Apr 13 '23 at 10:06
  • Create an index by ```(`date` ASC, `planing_hour_id` ASC, `user_id`)```. – Akina Apr 13 '23 at 10:27
  • That *perfect index* is not that perfect for the query. The query has an `IN` operator and MySQL may not be able to handle it well. Why you don't separate the query into 4 seeks instead? – The Impaler Apr 13 '23 at 13:32
  • @TheImpaler 4 seeks? can you elaborate and perhaps write an answer? I am not sure how would I split the query into 4 seeks. You mean utilize subquerying? – Fusion Apr 13 '23 at 18:29

2 Answers2

1

You can eliminate "Using filesort" if MySQL's natural order of reading the rows matches the order you request in your ORDER BY. Thus there is no sorting needed, because they are already read in the right order.

MySQL always reads in index order. Whether it uses a secondary index like yours, or the primary key (also called the clustered index), rows will be read in the order they are stored in that index. So if that order does not match your ORDER BY, then a filesort is required.

(Note "filesort" doesn't necessarily mean it will be on disk. If there are few rows in the result, the sorting can be done in RAM relatively quickly.)

So why doesn't your index count as a "perfect" index?

Think of this analogy:

SELECT ... FROM TelephoneBook 
WHERE last_name IN ('Addams', 'Kirk', 'Smith') 
ORDER BY first_name;

The result:

last_name first_name
Addams Morticia
Kirk James
Smith Sarah Jane

These are read in index order by last_name. But they are not implicitly in order by first_name, the order requested in the query. The query needs a phase to filesort the set of matching rows.

If you were to make the "perfect" index this way:

ALTER TABLE `TelephoneBook`
    ADD INDEX `my_perfect_index` (
        `first_name`,
        `last_name`
    );

The query might use this index, and eliminate the filesort, because it's reading the rows in order by first_name. However, this can't also do the search to find rows matching last_name most efficiently.

The best it can do is index condition pushdown, which is to tell the storage engine not to return rows that don't match the index — but the storage engine does have to examine all the indexes to evaluate them. This is not as good for optimization as reading from an index leading with last_name.

Ultimately, there is no perfect index in cases where the query must search with a range condition (IN() qualifies as a range condition), and also do a sort that doesn't match the column of the range condition.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you very much for the answer. Just out of curiosity - I also tried to declare non-composite `user_id` index. I thought, MySQL would pick it up instead of `my_perfect_index` - But surprisingly, it would not. Maybe secondary columns specified in index are somehow utilized there? – Fusion Apr 13 '23 at 18:48
  • I'm not sure, it seems the optimizer is doing something that is not documented well. For example, I see it reports "using index condition" indicating it is using index condition pushdown, but I see no reason why it would do that. It's a mystery! – Bill Karwin Apr 13 '23 at 19:18
1

If your concern is high performance you can try rewriting your query as:

select * from events where user_id = 118
union all select * from events where user_id = 105
union all select * from events where user_id = 116
union all select * from events where user_id = 196
union all select * from events where user_id = 338
order by date, planning_hour_id

This form effectivelly seeks the index five times (albeit hitting the secondary and primary index each time), then unions the result sets, and finally sort them.

In this case a simple index is useful, since the sorting won't use the index and will require materialization. Nothing bad with it unless the query ends up selecting thousands of rows.

create index i1 on events (user_id);
The Impaler
  • 45,731
  • 9
  • 39
  • 76