4

SELECT MAX vs ORDER BY LIMIT 1 question has been answered here several times, but if I add a WHERE clause, things change dramatically

Here is my table:

Field Type Null Key Default Extra
id b'int' 'NO' 'PRI' None 'auto_increment'
'open_time' b'bigint' 'NO' 'UNI' None ''

Note, that both columns are indexed.

And here are the requests:

SELECT id from table
WHERE open_time > 0
ORDER BY id DESC LIMIT 1

SELECT MAX(id) from BTCUSDT1mHist
WHERE open_time > 0

EXPLAIN ANALYZE shows the following: ORDER BY:

-> Limit: 1 row(s)  (cost=0.10 rows=1) (actual time=0.038..0.038 rows=1 loops=1)
    -> Filter: (table.open_time > 0)  (cost=0.10 rows=1) (actual time=0.037..0.037 rows=1 loops=1)
        -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

MAX():

-> Aggregate: max(table.id)  (cost=325890.06 rows=1081033) (actual time=1025.181..1025.181 rows=1 loops=1)
    -> Filter: (table.open_time > 0)  (cost=217786.76 rows=1081033) (actual time=0.032..866.890 rows=2180645 loops=1)
        -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

ORDER BY finishes in 0.0012 seconds, while MAX() does in 1.026 seconds

I have read this question also, but it doesn't seem to cover my situation

The question is: why does MAX() takes so much longer than ORDER BY LIMIT?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
prupru
  • 135
  • 6
  • The query with MAX will first get all the rows with open_time>0, and then check all `id`s to find the max value for `id`. The query with ORDER BY will get the last record using the index on id, then check if open_time>0, and return this records, which is much faster if the first found record happens to have an open_time>0. – Luuk Oct 16 '21 at 12:11
  • @Luuk probably so, but the order of SQL operations should be first WHERE then ORDER BY and LIMIT after that https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/ – prupru Oct 16 '21 at 13:57
  • In general you probably are right, but MySQL *sees* `ORDER BY id LIMIT 1`, and decides that reading on the primary key is most efficient, so it evaluate the `WHERE` in a second step. In the other query there is no ORDER BY, but where WHERE clause is using open_time, and there is an index on that field, making MySQL choose that index. – Luuk Oct 16 '21 at 14:04

1 Answers1

3

Compare in the analyze:

    -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

Versus:

    -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

Examining 2 rows must be a lot quicker than examining 2,180,645 rows.

In the query with ORDER BY id DESC LIMIT 1, it uses the primary key index. It starts at the end because it's a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.

In the query with MAX(id), it uses the index on open_time. But because it's a range condition open_time > 0, it can't assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time index, searching for the greatest value of id (primary keys are implicitly part of a secondary index). There's no chance of early-termination as there is in the query with LIMIT.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • thanks for the answer! So, without LIMIT optimization, both MAX and ORDER LIMIT approaches would give more or less similar results? – prupru Oct 16 '21 at 17:02
  • I suppose so, but MySQL performs the LIMIT optimization automatically in this query. – Bill Karwin Oct 16 '21 at 17:10