2

I am observing weird behaviour which I am trying to understand.

MySQL version: 5.7.33 I have the below query:

select * from a_table where time>='2022-05-10' and guid in (102,512,11,35,623,6,21,673);

a_table has primary key on time,guid and index on guid
The query I wrote above has very good performance and as per explain plan is using index condition; using where; using MRR

As I increase the number of value in my in clause, the performance is impacted significantly.

After some dry runs, I was able to get a rough number. For less than ~14500 values the explain plan is same as above. For number of values higher than this, explain plan only uses where and it takes forever to run my query.

In other words, for example, if I put 14,000 values in my in clause, the explain plan has 14,000 rows as expected. However, if I put 15,000 values in my in clause, the explain has 221200324 rows. I dont even have these many rows in my whole table.

I am trying to understand this behaviour and to know if there is any way to fix this.

Thank you

hsnsd
  • 1,728
  • 12
  • 30
  • How many distinct guids you have in table? Is 14000 5% of that number? – Salman A May 24 '22 at 11:31
  • @SalmanA No where close to 5%. It has roughly 10 million rows so 14000 is like 0.14% – hsnsd May 24 '22 at 11:35
  • 10 million "distinct" guids? – Salman A May 24 '22 at 11:36
  • @SalmanA yes all distinct – hsnsd May 24 '22 at 11:41
  • #1. When the amount of values in the list is large (in your case "large" seems to be approx. above 10k values) then store these values into temptable, index it and use in the query as one more datasource. #2. Look does it is possible to swap columns in PK. If not then create index by `(guid,time)` instead of the index by `(guid)`. – Akina May 24 '22 at 12:46

1 Answers1

5

Read about Limiting Memory Use for Range Optimization.

When you have a large list of values in an IN() predicate, it uses more memory during the query optimization step. This was considered a problem in some cases, so recent versions of MySQL set a max memory limit (it's 8MB by default).

If the optimizer finds that it would need more memory than the limit, there is not another condition in your query it can use to optimize, it gives up trying to optimize, and resorts to a table-scan. I infer that your table statistics actually show that the table has ~221 million rows (though table statistics are inexact estimates).

I can't say I know the exact formula to know how much memory is needed for a given list of values, but given your observed behavior, we could guess that it's about 600 bytes per item on average, given that 14k items works and more than that does not work.

You can set range_optimizer_max_mem_size = 0 to disable the memory limit. This creates a risk of excessive memory use, but it avoids the optimizer "giving up." We set this value on all MySQL instances at my last job, because we couldn't educate the developers to avoid creating huge lists of values in their queries.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you for the insights Bill. You are right, it is currently set to 8mb. I dont have sufficient privileges to change this value so got a work around by splitting up my query to put only 14k values per query. Nonetheless, this was a good addition to knowledge and answers my question about why the behaviour of query was changing! – hsnsd May 25 '22 at 10:14
  • 1
    Note you can `range_optimizer_max_mem_size=0` as a session variable right before you run your query. You don't need any special privileges to do that. It does require writing some code, but you seem ready to write a lot of special-purpose code if you were going to split up the list into batches of 14k. – Bill Karwin May 25 '22 at 16:26
  • We adjusted the range_optimizer_max_mem_size to a large value, but there was still a full table scan. Strangely, this kind of problem seems to only appear in online business, we can't reproduce this situation offline. – egraldlo Aug 02 '23 at 09:03