How come using limit dramatically speeds up a simple but long select query

Question

I stumbled upon this today and was quite shocked. When searching Google I normally see the question is revered as in - using limit caused it to return slower.

I have a MySQL table with a few million rows in it. The PK is id and as such it's a unique index.

When I performed a query of the form select a, b, c, ... from table where id in (1, 2, 3, ..., 5000) it took about 15-20 minutes to fetch all results. However, when I simply added limit 1000000 at the end (I used an extremely larger number than needed on purpose), it returned in a few seconds.

I know that using limit with smaller numbers than returned help as it's returning as soon as the "quota" is filled, but here I can't find the reason for such a dramatic improvement.
Can anyone please explain it?
Should I just add a limit to every query to improve its performance?
Why doesn't MySQL searches with and without it the same?

Update

Per requested the explain for each:

With limit (takes a few seconds)

{
    "id" : 1,
    "select_type" : "SIMPLE",
    "table" : "table",
    "partitions" : null,
    "type" : "range",
    "possible_keys" : "PRIMARY",
    "key" : "PRIMARY",
    "key_len" : "4",
    "ref" : null,
    "rows" : 4485,
    "filtered" : 100.0,
    "Extra" : "Using where"
}

Without limit (takes 15-20 minutes)

{
    "id" : 1,
    "select_type" : "SIMPLE",
    "table" : "table",
    "partitions" : null,
    "type" : "ALL",
    "possible_keys" : "PRIMARY",
    "key" : null,
    "key_len" : null,
    "ref" : null,
    "rows" : 69950423,
    "filtered" : 50.0,
    "Extra" : "Using where"
}

I'm not fluent in this but it looks like it used the key when I used limit but it didn't when I ran it without it.
Possibly other differences in the filtered and type fields which I don't know what they mean.
How come?

Update 2

A lot of questions asked so I'll attempt to provide details for all.

The MySQL version is 8.0.28 and the table engine is InnoDB.
I've ran the tests a few times one after the other, not only once.

Running the same EXPLAIN with fewer (10) values in the IN clause returned the same result for both with limit and without it!

{
    "id" : 1,
    "select_type" : "SIMPLE",
    "table" : "table",
    "partitions" : null,
    "type" : "range",
    "possible_keys" : "PRIMARY",
    "key" : "PRIMARY",
    "key_len" : "4",
    "ref" : null,
    "rows" : 10,
    "filtered" : 100.0,
    "Extra" : "Using where"
}

Now the FORMAT=JSON (with redacted parts):

Without limit

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "8369910.88"
    },
    "table": {
      "table_name": "table",
      "access_type": "ALL",
      "possible_keys": [
        "PRIMARY"
      ],
      "rows_examined_per_scan": 70138598,
      "rows_produced_per_join": 35069299,
      "filtered": "50.00",
      "cost_info": {
        "read_cost": "4862980.98",
        "eval_cost": "3506929.90",
        "prefix_cost": "8369910.88",
        "data_read_per_join": "558G"
      },
      "used_columns": [...],
      "attached_condition": "(`db`.`table`.`id` in (...))"
    }
  }
}

With limit

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "8371410.92"
    },
    "table": {
      "table_name": "table",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "id"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 4485,
      "rows_produced_per_join": 35069255,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "4864485.17",
        "eval_cost": "3506925.54",
        "prefix_cost": "8371410.92",
        "data_read_per_join": "558G"
      },
      "used_columns": [...],
      "attached_condition": "(`db`.`table`.`id` in (...))"
    }
  }
}

Does this answer your question? [Why is MySQL slow when using LIMIT in my query?](https://stackoverflow.com/questions/17747871/why-is-mysql-slow-when-using-limit-in-my-query) — Nico Haase, Nov 17 '22 at 14:02
Or this? https://stackoverflow.com/questions/15460133/mysql-dramatically-slower-query-execution-if-use-limit-1-instead-of-limit-5 — Nico Haase, Nov 17 '22 at 14:02
Or this? https://stackoverflow.com/questions/13846691/adding-limit-clause-to-mysql-query-slows-it-down-dramatically — Nico Haase, Nov 17 '22 at 14:02
@NicoHaase All of those are exactly as I wrote from when I searched for this myself: using limit caused it to return slower. So no they don't answer my question as in my case it was much faster, not slower. — PeNpeL, Nov 17 '22 at 15:07
@WillPalmer I don't think it was caching since running the same query without limit one after the other took about the same 15-20 minutes, and other teammates experienced the same when running with and without limit after my first run — PeNpeL, Nov 17 '22 at 15:09
You can also try setting `range_optimizer_max_mem_size=0` which means don't constrain the memory used for analyzing the long `IN()` list. I've seen cases where very long lists of values cause the optimizer to give up and resort to a table-scan, so we had to lift the limit on memory use. Read https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#range-optimization-memory-use for details. — Bill Karwin, Nov 17 '22 at 19:03
The ordinary buffer_pool can lead to a 10x speedup. Please run _each_ test twice. — Rick James, Nov 17 '22 at 22:32
@NicoHaase - At least 2 of those 3 references are to queries using MyISAM tables. They were 9 years ago. MyISAM is mostly gone, and should not be considered today. — Rick James, Nov 17 '22 at 23:00
@MarekKnappe done. It does seem like the very long `IN` clause is what caused this. possibly like @Bill suggestion, I'm just not sure why it happens. — PeNpeL, Nov 20 '22 at 09:47

Marek Knappe · Answer 1 · 2022-11-21T18:40:17.247

As there is a very long thread under the post in the comments, I will just add the answer here that is both mine and @Bill and it looks like the issue is a very long argument list in IN() part of the statement.

The culprit is to change range_optimizer_max_mem_size parameter number to accommodate more inputs in IN, as exceeding that parameter will cause a full table scan.

range optimize is reserving memory for range scanning, so not having enough of that memory set - will result full table scan

Now why does the LIMIT clause makes it happen - This part I would guess:

LIMIT is forcing MySQL to use a different range scan type
LIMIT is actually limiting the number of resources that will be returned so MySQL would know it will not return more than X, where without limit it would assume it can return 69950423 which would be more than some other memory limits that you set up, worth trying with limit equals number of rows in the table

Thanks @Marek but why does using `limit`, even with a much higher value, makes the optimizer ignore / handle that long list of `IN` values? — PeNpeL, Nov 21 '22 at 10:52