12

Here's a slow query on Postgres 9.1.6, even though the maximum count is 2, with both rows already identified by their primary keys: (4.5 seconds)

EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl WHERE id IN ('6d48fc431d21', 'd9e659e756ad') AND data ? 'building_floorspace' AND data ?| ARRAY['elec_mean_monthly_use', 'gas_mean_monthly_use'];
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4.09..4.09 rows=1 width=0) (actual time=4457.886..4457.887 rows=1 loops=1)
   ->  Index Scan using idx_tbl_on_data_gist on tbl  (cost=0.00..4.09 rows=1 width=0) (actual time=4457.880..4457.880 rows=0 loops=1)
         Index Cond: ((data ? 'building_floorspace'::text) AND (data ?| '{elec_mean_monthly_use,gas_mean_monthly_use}'::text[]))
         Filter: ((id)::text = ANY ('{6d48fc431d21,d9e659e756ad}'::text[]))
 Total runtime: 4457.948 ms
(5 rows)

Hmm, maybe if I do a subquery with just the primary key part first...: (nope, still 4.5+ seconds)

EXPLAIN ANALYZE SELECT COUNT(*) FROM (  SELECT * FROM tbl WHERE id IN ('6d48fc431d21', 'd9e659e756ad')  ) AS t WHERE data ? 'building_floorspace' AND data ?| ARRAY['elec_mean_monthly_use', 'gas_mean_monthly_use'];
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4.09..4.09 rows=1 width=0) (actual time=4854.170..4854.171 rows=1 loops=1)
   ->  Index Scan using idx_tbl_on_data_gist on tbl  (cost=0.00..4.09 rows=1 width=0) (actual time=4854.165..4854.165 rows=0 loops=1)
         Index Cond: ((data ? 'building_floorspace'::text) AND (data ?| '{elec_mean_monthly_use,gas_mean_monthly_use}'::text[]))
         Filter: ((id)::text = ANY ('{6d48fc431d21,d9e659e756ad}'::text[]))
 Total runtime: 4854.220 ms
(5 rows)

How can I prevent Postgres from inlining the subquery?

Background: I have a Postgres 9.1 table using hstore and with a GiST index on it.

Seamus Abshere
  • 8,326
  • 4
  • 44
  • 61
  • 1
    The cost estimation on scanning the gist index appears to be quite wrong (at 4.09, it's lower than accessing the primary key's index). You might check if you can do something about it, like ensuring fresh statistics and/or increase the [statistics target](http://www.postgresql.org/docs/9.1/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET). – Daniel Vérité Feb 15 '13 at 19:15
  • I increased the target to 9999 and ran `VACUUM ANALYZE`... still getting the same cost estimate versus reality discrepancy. Thanks for your suggestion, though! – Seamus Abshere Feb 15 '13 at 21:51

2 Answers2

13

I think OFFSET 0 is the better approach since it's more obviously a hack showing that something weird is going on, and it's unlikely we'll ever change the optimiser behaviour around OFFSET 0 ... wheras hopefully CTEs will become inlineable at some point CTEs became inlineable by default in PostgreSQL 12. The following explanation is for completeness's sake; use Seamus's answer.

For uncorrelated subqueries you could exploit PostgreSQL 11 and older's refusal to inline WITH query terms to rephrase your query as:

WITH t AS (
    SELECT * FROM tbl WHERE id IN ('6d48fc431d21', 'd9e659e756ad')
)
SELECT COUNT(*) 
FROM t 
WHERE data ? 'building_floorspace' 
AND data ?| ARRAY['elec_mean_monthly_use', 'gas_mean_monthly_use'];

This has much the same effect as the OFFSET 0 hack, and like the offset 0 hack exploits quirks in Pg's optimizer that people use to get around Pg's lack of query hints ... by using them as query hints.

But the OFFSET 0 hack is somewhat officially blessed, wheras CTE abuse doesn't work anymore in PostgreSQL 12. (Yay!).

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Postgres since v12 inlines CTEs by default. – Justin Feb 15 '20 at 02:48
  • I don't understand what `OFFSET 0` does. I have a query (without subqueries) with limit 10 that runs really fast when index is being picked. But same query, on same data set sometimes uses index and sometimes not leading to rows scan of 10m records. Could adding `OFFSET 0` to such query help it to always pick index and if yes then why? – Kamil Dziedzic Dec 09 '20 at 09:35
  • @KamilDziedzic It's a hack that stops PostgreSQL inlining across subquery boundaries. – Craig Ringer Dec 15 '20 at 06:55
8

Apparently there's a way to tell Postgres not to inline: (0.223ms!)

EXPLAIN ANALYZE SELECT COUNT(*) FROM (  SELECT * FROM tbl WHERE id IN ('6d48fc431d21', 'd9e659e756ad') OFFSET 0 ) AS t WHERE data ? 'building_floorspace' AND data ?| ARRAY['elec_mean_monthly_use', 'gas_mean_monthly_use'];
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.14..8.15 rows=1 width=0) (actual time=0.165..0.166 rows=1 loops=1)
   ->  Subquery Scan on t  (cost=4.14..8.14 rows=1 width=0) (actual time=0.160..0.160 rows=0 loops=1)
         Filter: ((t.data ? 'building_floorspace'::text) AND (t.data ?| '{elec_mean_monthly_use,gas_mean_monthly_use}'::text[]))
         ->  Limit  (cost=4.14..8.13 rows=2 width=496) (actual time=0.086..0.092 rows=2 loops=1)
               ->  Bitmap Heap Scan on tbl  (cost=4.14..8.13 rows=2 width=496) (actual time=0.083..0.086 rows=2 loops=1)
                     Recheck Cond: ((id)::text = ANY ('{6d48fc431d21,d9e659e756ad}'::text[]))
                     ->  Bitmap Index Scan on tbl_pkey  (cost=0.00..4.14 rows=2 width=0) (actual time=0.068..0.068 rows=2 loops=1)
                           Index Cond: ((id)::text = ANY ('{6d48fc431d21,d9e659e756ad}'::text[]))
 Total runtime: 0.223 ms
(9 rows)

The trick is OFFSET 0 in the subquery.

Seamus Abshere
  • 8,326
  • 4
  • 44
  • 61
  • ... but PostgreSQL doesn't have query hints. No. Really. ;-) – Craig Ringer Feb 15 '13 at 15:27
  • I have similar issue but with single query, postgres decides to do full table scan instead of index scan leading to killing database. And it does it only sometimes. The returned number of rows is set to 10, and it fails even when there is none to return. Whenever it picks index - extremely fast, picks to use table scan - rows scan over millions of records. I wonder if `OFFSET 0` can help when this is not a subquery? – Kamil Dziedzic Dec 09 '20 at 09:15