Get top row(s) with highest value, with ties

Question

I have a relation in PostgreSQL named product which contains 2 fields: id and quantity.

I want to find the id of the products with the highest quantity. As far as I know, there are 2 ways of doing it:

SELECT id FROM product WHERE quantity >= ALL(SELECT quantity FROM product)

or

SELECT id FROM product WHERE quantity = (SELECT MAX(quantity) FROM product)

Is there any difference in their speed of execution?

@GMB I get just the same results at the same time, but I doubt that select the max quantity first will give me more advantage when talking about time — Đào Minh Dũng, Jul 30 '20 at 17:30
Test, Test, Test, and get explain plans. Unless your first query turns out to be *significantly* faster go with the second (select max...). It seems that it more clearly indicates your intent (IMHO anyway). — Belayer, Jul 30 '20 at 17:55

Erwin Brandstetter · Accepted Answer · 2022-02-11T03:41:02.387

The first query fails if any row has quantity IS NULL (as Gordon demonstrates).
The second query only fails if all rows have quantity IS NULL. So it should be usable in most cases. (And it's faster.)

Postgres 13 or newer

Use the standard SQL clause WITH TIES:

SELECT id
FROM   product
ORDER  BY quantity DESC NULLS LAST
FETCH  FIRST 1 ROWS WITH TIES;

db<>fiddle here

Works with any amount of NULL values.

The manual:

SQL:2008 introduced a different syntax to achieve the same result, which PostgreSQL also supports. It is:
OFFSET start { ROW | ROWS }
FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
In this syntax, the start or count value is required by the standard to be a literal constant, a parameter, or a variable name; as a PostgreSQL extension, other expressions are allowed, but will generally need to be enclosed in parentheses to avoid ambiguity. If count is omitted in a FETCH clause, it defaults to 1. The WITH TIES option is used to return any additional rows that tie for the last place in the result set according to the ORDER BY clause; ORDER BY is mandatory in this case. ROW and ROWS as well as FIRST and NEXT are noise words that don't influence the effects of these clauses.

Notably, WITH TIES cannot be used with the (non-standard) short syntax LIMIT n.

It's the fastest possible solution. Faster than either of your current queries. More important for performance: have an index on (quantity). Or a more specialized covering index to allow index-only scans (a bit faster, yet):

CREATE INDEX ON product (quantity DESC NULLS LAST) INCLUDE (id);

See:

Do covering indexes in PostgreSQL help JOIN columns?

We need NULLS LAST to keep NULL values last in descending order. See:

Sort by column ASC, but NULL values first?

Postgres 12 or older

A NULL-safe query:

SELECT id, quantity
FROM   product
WHERE  quantity IS NOT DISTINCT FROM (SELECT MAX(quantity) FROM product);

Or, probably faster:

SELECT id, quantity
FROM  (
   SELECT *, rank() OVER (ORDER BY quantity DESC NULLS LAST) AS rnk
   FROM   product
   ) sub
WHERE  rnk = 1;

See:

PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

Faster alternatives for big tables:

Equivalent for FETCH FIRST WITH TIES in PostgreSQL 11 with comparable performance

score 1 · Answer 2 · answered Jul 30 '20 at 18:02

1

I tried your methods in postgres (test table distributed by id). That first method ran much slower for me. Here were my comparison results:

Method 1 above: 3.1 seconds

Method 2 above: 0.13 seconds

Method 1 was at least 10 times slower in repeated efforts. I think your method 2 is the better option, as the sub-query likely runs much faster than the sub-query in the other option.

answered Jul 30 '20 at 18:02

Isolated

5,169
1
6
18

Thank you. Can you give me more details on the size of the table that you worked on? I tried the above query with a small table and could not see the differences in running time – Đào Minh Dũng Jul 30 '20 at 18:07
The table was 2 columns, with somewhere between 900,000 and 3 million rows. I don't recall for certain. – Isolated Jul 31 '20 at 21:30

score 1 · Answer 3 · answered Jul 30 '20 at 18:35

1

Your queries are NOT equivalent. The first returns no rows at all if any of the quantity values are NULL. The second ignores NULL values.

Here is a db<>fiddle illustrating this.

answered Jul 30 '20 at 18:35

Gordon Linoff

1,242,037
58
646
786

Thank you, I did not know that. If all of the quantity values are not NULL, which should be the faster one? – Đào Minh Dũng Jul 30 '20 at 19:08
@ĐàoMinhDũng . . . I think Isolated answered that question. – Gordon Linoff Jul 30 '20 at 22:20

score 1 · Answer 4 · answered Jul 30 '20 at 18:46

1

there is the 3rd variant

SELECT id FROM product 
WHERE quantity = (SELECT quantity FROM product ORDER BY quantity DESC NULLS LAST LIMIT 1)

if the table has btree index as (quantity DESC NULLS LAST) this variant will be super-fast

answered Jul 30 '20 at 18:46

Игорь Тыра

905
4
12

Get top row(s) with highest value, with ties

4 Answers4

Postgres 13 or newer

Postgres 12 or older

Linked