5

Given the following two queries:

Query #1

SELECT log.id
FROM log
WHERE user_id IN
      (188858, 188886, 189854, 203623, 204072)
      and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;

Query #2 - 4 IDs instead 5

SELECT log.id
FROM log
WHERE user_id IN
      (188858, 188886, 189854, 203623)
      and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;

Explain Plan

-- Query #1
1   SIMPLE  log range   idx_user_id_and_log_id  idx_user_id_and_log_id  4       41280   Using index condition; Using where; Using filesort
-- Query #2
1   SIMPLE  log index   idx_user_id_and_log_id  PRIMARY                 4       53534   Using where

Why the addition of a single ID makes the execution plan so different? I'm talking about a difference in time of milliseconds to ~1 minute. I thought that it could be related to the eq_range_index_dive_limit parameters, but it's bellow 10 anyway (the default). I know that I can force the usage of the index instead of the clustered index, but I wanted to know why MySQL decided that.

Should I try to understand that? Or sometimes it's not possible to understand query planner decisions?

Extra Details

  • Table Size: 11GB
  • Rows: 108 Million
  • MySQL: 5.6.7
  • Doesn't matter which ID is removed from the IN clause.
  • The index: idx_user_id_and_log_id(user_id, id)
Israel Fonseca
  • 1,082
  • 10
  • 22
  • If you remove ID '188858' you get the same results? I guess it could be related to query statistics: you remove one ID and suddenly it thinks that 'type' lookup will be more expensive than scanning whole table. I'm not an expert in MySQL, but you can try updating statistics (if possible) or adding 'type' to the index and see if something changes. – Gabrielius Aug 11 '18 at 18:39
  • Yeah, I thought that it could be related to statistics somehow, but I did update it and still the same behavior occurs. When I remove the ID `189854` it gets slow at the same level of using all 5 IDs, which is odd! All the other combinations are fast enough (like removing `188858`). – Israel Fonseca Aug 14 '18 at 19:58
  • For the sake of science try adding column `type` to the index and see what happens then – Gabrielius Aug 15 '18 at 19:13
  • Is `ID` the `PRIMARY KEY`? Let's see `SHOW CREATE TABLE`. – Rick James Aug 25 '18 at 04:55
  • Unfortunately is not so easy just add an index for the sake of science, the write throughput impact of that index creation would affect our systems (and here in Brazil we have some very awful labor time restrictions, that I rather don't mess with). Yes, ID is the primary key (InnoDB). – Israel Fonseca Aug 28 '18 at 19:29

3 Answers3

2

As you have shown, MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:

  1. Read all qualifying rows, sort them, and pick the n top rows.
  2. Read the rows in sorted order and stop when n qualifying rows have been found.

In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, one probably has to read a lot more of the table in sorted order in order to find the first 25 qualifying rows than what the optimizer expected.

There have been several improvements in how LIMIT queries are handled, both in later releases of 5.6 (you are running on a pre-GA release!), and in newer releases (5.7, 8.0). I suggest you try to upgrade to a later release, and see if this still is an issue.

In general, if you want to understand query planner decisions, you should look at the optimizer trace for the query.

Øystein Grøvlen
  • 1,266
  • 6
  • 8
0

JOIN is much more efficient.

Create a temporary table with the values of the IN operator. Then make a JOIN between table 'log' to the temporary table of values.

Refer to this answer for more info.

Yakov R.
  • 602
  • 8
  • 22
  • But the answer talks about lots of IDs and huge gaps, which is not the case. The values are constants, not even a product of a subquery. – Israel Fonseca Aug 14 '18 at 20:00
  • Just insert your constants as rows into the temp table. The table will only have one field. – Yakov R. Aug 16 '18 at 20:11
0

Add

INDEX(user_id, type, id),
INDEX(type, user_id, id)

Each of these is a "covering" index. As such, the entire query can be performed by looking only in one index, without touching the 'data'.

I have two choices for the Optimizer -- hopefully it will be able to pick whether user_id IN (...) is more selective or type IN (...) in order to pick the better index.

If, after adding those, you don't have any use for idx_user_id_and_log_id(user_id, id), DROP it.

(No, I can't explain why query 2 chose to do a table scan.)

Rick James
  • 135,179
  • 13
  • 127
  • 222