45

I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:

SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
ORDER BY "foos"."id" DESC
LIMIT 5 OFFSET 0;

This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the ORDER BY clause or the LIMIT clause, it runs almost instantaneously.

I'm assuming that the presence of both ORDER BY and LIMIT are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?

In case it helps, here is the EXPLAIN for all 3 cases:

//////// Both ORDER and LIMIT
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
ORDER BY "foos"."id" DESC
LIMIT 5 OFFSET 0;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..16663.44 rows=5 width=663)
   ->  Nested Loop  (cost=0.00..25355084.05 rows=7608 width=663)
         Join Filter: (foos.bar_id = bars.id)
         ->  Index Scan Backward using foos_pkey on foos  (cost=0.00..11804133.33 rows=4963477 width=663)
               Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))
         ->  Materialize  (cost=0.00..658.96 rows=182 width=4)
               ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)
                     Index Cond: (baz_id = 13266)
(8 rows)

//////// Just LIMIT
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
LIMIT 5 OFFSET 0;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..22.21 rows=5 width=663)
   ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)
         ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)
               Index Cond: (baz_id = 13266)
         ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)
               Index Cond: (foos.bar_id = bars.id)
               Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text))
(7 rows)

//////// Just ORDER
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
ORDER BY "foos"."id" DESC;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=36515.17..36534.19 rows=7608 width=663)
   Sort Key: foos.id
   ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)
         ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)
               Index Cond: (baz_id = 13266)
         ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)
               Index Cond: (foos.bar_id = bars.id)
               Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text))
(8 rows)
jakeboxer
  • 3,300
  • 4
  • 26
  • 27

4 Answers4

21

When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.

Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal.

For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.

Andrew Lazarus
  • 18,205
  • 3
  • 35
  • 53
  • ok... so `foos.bars.last` results in a full index scan on bars... nice -_- – Jim Nov 04 '16 at 18:56
  • 1
    ok... so this results in a full index scan only if foos have 0 bars... still annoying though – Jim Nov 04 '16 at 19:19
  • solution: `foos.bars.last unless foos.bars.empty?`... still annoyed though – Jim Nov 04 '16 at 19:22
  • @Jim What do you mean by `last` and `empty` here: doesn't look like SQL. – Andrew Lazarus Nov 04 '16 at 20:25
  • Oh like the OP I hit the issue with an active_record (rails) generated Postgres query. – Jim Nov 04 '16 at 20:41
  • If you can convert to SQL (and I suggest a fresh question), someone will look. – Andrew Lazarus Nov 04 '16 at 21:43
  • As of PostgreSQL 10, running the `ORDER BY` in a subquery no longer makes a difference. Instead, you have to use a CTE: `WITH (... query ... ORDER BY ...) AS t: SELECT * FROM t LIMIT ...`. – jwodder Oct 07 '18 at 15:19
  • 1
    For anyone having performance issues with Django, this is exactly what it does by default with the method `first()`: `ORDER BY LIMIT 1`. – augustomen Jul 26 '19 at 18:50
  • **"perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query"** made all the difference for me. It also feels cleaner. In contrast: [adding an unneeded `ORDER BY`](https://stackoverflow.com/a/27237698/1717535) helped me on Postgres 9.6 but didn't do the trick after importing a dump on Postgres 13. – Fabien Snauwaert Nov 13 '21 at 17:43
3

Probably it happens because before it tries to order then to select. Why do not try to sort the result in an outer select all? Something like: SELECT * FROM (SELECT ... INNER JOIN ETC...) ORDER BY ... DESC

Davide Ungari
  • 1,920
  • 1
  • 13
  • 24
2

Your query plan indicates a filter on

(((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))

which doesn't appear in the SELECT - where is it coming from?

Also, note that expression is listed as a "Filter" and not an "Index Cond" which would seem to indicate there's no index applied to it.

ic3b3rg
  • 14,629
  • 4
  • 30
  • 53
0

it may be running a full-table scan on "foos". did you try changing the order of the tables and instead use a left-join instead of inner-join and see if it displays results faster.

say...

SELECT "bars"."id", "foos".*
FROM "bars"
LEFT JOIN "foos" ON "bars"."id" = "foos"."bar_id"
WHERE "bars"."baz_id" = 13266
ORDER BY "foos"."id" DESC
LIMIT 5 OFFSET 0;
Christian Noel
  • 305
  • 4
  • 14