13

MIN/MAX vs ORDER BY and LIMIT

To follow up on this question: I found some results very different from what Sean McSomething describes:

I have a table with about 300M rows.

Select max(foo) from bar; takes about 15 sec. to run

Select foo from bar order by foo desc limit 1; takes 3 sec. to run

Sean's statement "It looks like MIN() is the way to go - it's faster in the worst case, indistinguishable in the best case" just doesn't hold for this case...but I have no idea why. Can anyone offer an explanation?

Edit: Since I am unable to show the table's structure here: assume that bar is a table in an ndb_cluster with no relations, foo is an arbitrary data point with no index.

Community
  • 1
  • 1
Mr Griever
  • 4,014
  • 3
  • 23
  • 41

4 Answers4

6

To avoid a full pass, add an INDEX on foo column.

aland
  • 1,824
  • 2
  • 26
  • 43
Svisstack
  • 16,203
  • 6
  • 66
  • 100
  • 1
    Would the lack of an index on foo be the reason why MAX runs so slowly? Why? – Mr Griever Nov 11 '10 at 18:22
  • 3
    Because the lack of an index means that a full pass needs to be made on the table. – Vic Nov 11 '10 at 20:52
  • 4
    Assuming there was an index on foo, would `MAX(foo)` or `order by foo DESC limit 1` be faster? – Programster Nov 17 '14 at 11:31
  • 1
    @Programster: I will bet that this doesn't matter, – Svisstack Nov 17 '14 at 15:39
  • An index makes no difference for whether to use `MAX(...)` or `ORDER BY...LIMIT 1`. This answer could have been posted as a comment because it's helpful, but does not answer the question to any extent – Luc Aug 11 '22 at 23:42
2

I came across this question and thought I'd add what I've found. Note that the columns are indexed. I'm running on MariaDB 10.2.14.

I have a query which looks like SELECT MAX(created) FROM tbl WHERE group=0 AND created IS NOT NULL. There's an index on (group,created) (both are ints, but created can be NULL). There are many entries with group=0, not many where created IS NULL. tbl is using the Aria storage engine.

EXPLAIN shows the index is being used and gives a row count of 46312, with extra saying "Using where; Using index"

Running the query takes around 0.692s, but the status has something interesting:

Handler_read_key: 1 Handler_read_next: 45131 Handler_read_prev: 0

This seems to suggest that the key is being fully scanned for the maximum; using MIN instead of MAX seems to give similar results. This seems to suggest that MIN/MAX actually can't make use of the optimisation to just pick the first/last entry of the index here.

However, if the query is changed to SELECT created FROM tbl WHERE group=0 AND created IS NOT NULL ORDER BY created DESC LIMIT 1, whilst the query seems to take about the same amount of time to run, and EXPLAIN shows the same info, the status shows:

Handler_read_key: 1 Handler_read_next: 0 Handler_read_prev: 0

I get similar results if the order by is changed to ASC. It seems to me that using an ORDER BY...LIMIT can skip an index scan, which could lead to faster queries if there are many rows which match the index condition, if my understanding is correct.
Note that for the above results, there's enough RAM and cache allocated for holding all indexes in cache, so, presumably, index scans are fast.

I haven't done any experiments with other conditions (different MySQL versions and storage engines) but I suppose the moral of this story is, checking status of queries via SHOW SESSION STATUS may help provide answers to these things.
At least in this case, the ORDER BY...LIMIT may be more efficient than MIN/MAX even when an index can be used.

zinga
  • 769
  • 7
  • 17
1

I've a similar situation, index on the column in question, and yet the order by & limit solution seems quicker. How good is that :)

widden
  • 129
  • 1
  • 2
0

Index or no index makes no difference for relative comparisons. Of course, one should always add indexes set to get the best performance when reading ("selecting") data.

I have a table where each row is a version of a user. New rows are added for new users, but also for updates to a user.

Listing all users' names on MariaDB 10.3:

ORDER BY ... LIMIT 1: 166, 157, 156, 169, 153, 158 ms

SELECT u.displayName
FROM users u
WHERE u.version = (
    SELECT u2.version
    FROM users u2
    WHERE u2.username = u.username
    ORDER BY u2.version DESC
    LIMIT 1)
ORDER BY u.displayName

MAX(...): 729, 724, 723, 721, 722 ms

SELECT u.displayName
FROM users u
WHERE u.version = (
    SELECT MAX(u2.version)
    FROM users u2
    WHERE u2.username = u.username)
ORDER BY u.displayName

Each milliseconds value is a separate run of the code, busting cache by varying the table aliases randomly (when I don't vary the aliases, subsequent runs are multiple orders of magnitude faster).

I'm very surprised the difference is that large, you'd think this is a fairly common thing / easy thing to optimize. Not that I'm a database developer so I can't say that I would have done any better, but if someone here is a database developer and wants to weigh in, I would certainly be interested if you want to post an answer with the technical difference!

Luc
  • 5,339
  • 2
  • 48
  • 48