PostgreSQL slow query with limit 1 and order by unneeded where condition

Question

I have one table accounts and indexes

accounts {
    id  text
    num_id  bigint
    pid text
    fid text
    created_at  timestamp with time zone
    updated_at  timestamp with time zone
}

CREATE UNIQUE INDEX accounts_pkey ON public.accounts USING btree (id)
CREATE INDEX fid_idx ON public.accounts USING btree (fid)
CREATE INDEX idx_accounts_pid_fid ON public.accounts USING btree (pid, fid)

And this query is slow

explain analyse SELECT * FROM accounts
WHERE pid = 'hd' AND fid = '123'
ORDER BY  id ASC
LIMIT 1;

Limit  (cost=0.56..3173.34 rows=1 width=123) (actual time=49389.351..49389.351 rows=0 loops=1)
  ->  Index Scan using accounts_pkey on accounts  (cost=0.56..5022497.13 rows=1583 width=123) (actual time=49389.350..49389.350 rows=0 loops=1)
        Filter: ((pid = 'hd'::text) AND (fid = '123'::text))
        Rows Removed by Filter: 56821193
Planning time: 0.094 ms
Execution time: 49389.368 ms

Per this answer, it may be solved by adding unneeded where condition pid and fid

explain analyse SELECT * FROM accounts
WHERE pid = 'hd' AND fid = '123'
ORDER BY  id ASC, pid, fid
LIMIT 1;

However, it does not work

Limit  (cost=0.56..3173.37 rows=1 width=123) (actual time=49495.236..49495.236 rows=0 loops=1)
  ->  Index Scan using accounts_pkey on accounts  (cost=0.56..5022556.07 rows=1583 width=123) (actual time=49495.234..49495.234 rows=0 loops=1)
        Filter: ((pid = 'hd'::text) AND (fid = '123'::text))
        Rows Removed by Filter: 56821555
Planning time: 0.096 ms
Execution time: 49495.253 ms

Is there I am missing?

PostgreSQL version: 9.6.8

Just out of curiosity, what is the running time of `SELECT * FROM accounts ORDER BY id LIMIT 1`? — Tim Biegeleisen, Sep 20 '19 at 04:33
@TimBiegeleisen, the running time of `SELECT * FROM accounts ORDER BY id LIMIT 1` is `Limit (cost=0.56..0.65 rows=1 width=123) (actual time=0.010..0.010 rows=1 loops=1) -> Index Scan using accounts_pkey on accounts (cost=0.56..4738719.60 rows=56980788 width=123) (actual time=0.010..0.010 rows=1 loops=1) Planning time: 0.078 ms Execution time: 0.027 ms ` — zangw, Sep 20 '19 at 05:14
I attempted an answer below which hopefully partially explains what you are seeing. I don't know why Postgres is choosing this execution plan, but a slight change in index definition might get around all of the problems. — Tim Biegeleisen, Sep 20 '19 at 05:22

score 2 · Answer 1 · answered Sep 20 '19 at 05:21

From your comments, the following query is in fact quite performant:

SELECT *
FROM accounts
ORDER BY id
LIMIT 1;

The reason this performs well is that the LIMIT and ORDER BY step is the only thing which Postgres needs to do before SELECT, and the accounts_pkey unique index can easily be scanned here. Actually, Postgres only needs to find the lowest id value, and then refer back to the clustered index to cover the SELECT *.

However, the query in your question is a bit different:

SELECT *
FROM accounts
WHERE pid = 'hd' AND fid = '123'
ORDER BY id ASC
LIMIT 1;

In this case, Postgres is choosing to scan the entire accounts_pkey index, starting with a filter step corresponding to your WHERE clause. Because accounts_pkey only covers the id column, Postgres must seek back to the clustered index to lookup the value of pid and fid. Ideally, Postgres would just start with the lowest id value and walk down the index until finding the first match on the pid and fid values. Regardless of what Postgres is deciding to do, the following covering index could help here:

CREATE INDEX idx_accounts_cover ON public.accounts USING btree (pid, fid, id);

Given that nearly 6 million records could now easily be removed using the above index, the remaining LIMIT/ORDER BY operation on id might be more tolerable. And since this index also covers id, Postgres only would have to a seek back to clustered index once, at the very end of the query.

Possibly extremely low cardinality on the `pid, fid` index? (just because this one case has reasonable cardinality doesn't exclude the other combinations having very low cardinality...) — MatBailie, Sep 20 '19 at 05:35
@MatBailie Yes, I was also thinking this, and wanted to see a sample of the OP's data. — Tim Biegeleisen, Sep 20 '19 at 05:36
@TimBiegeleisen, I did get a chance to add this index `(pid, fid, id)` in production env. However, I test it in beta env. After adding this index, the explain result is `)-> Sort (cost=8.39..8.39 rows=1 width=119) (actual time=0.028..0.028 rows=0 loops=1)Sort Key: id Sort Method: quicksort Memory: 25kB -> Index Scan using fid_idx on accounts (cost=0.29..8.38 rows=1 width=119) (actual time=0.020..0.020 rows=0 loops=1) Index Cond: (fid = '123'::text) Filter: (pid = 'hd'::text)`. `fid_idx` is used rather than idx_accounts_cover — zangw, Sep 20 '19 at 08:42
@zangw OK...the thing here is that the fastest Postgres could be would be to walk down the index on `id` and stop when finding the first matching record. But, if very few of your millions of records match `pid` and `fid`, then this process could take some time, regardless of strategy used. — Tim Biegeleisen, Sep 20 '19 at 09:03

PostgreSQL slow query with limit 1 and order by unneeded where condition

1 Answers1