4

I have a table in PosgreSQL 9.1.9. There's a schema:

CREATE TABLE chpl_text
(
  id integer NOT NULL DEFAULT nextval('chpl_text_id_seq1'::regclass),
  page_id bigint NOT NULL,
  page_idx integer NOT NULL,
  ...
);

I have around 40000000 (40M) rows in this table. Now, there's a query:

SELECT
  ...
FROM chpl_text
ORDER BY id
LIMIT 100000
OFFSET N

Everything runs smoothly for N <= 5300000. Execution plan looks like this

Limit  (cost=12743731.26..12984179.02 rows=100000 width=52)
  ->  Index Scan using chpl_text_pkey on chpl_text t  (cost=0.00..96857560.86 rows=40282164 width=52)

But for N >= 5400000 it magically changes into

Limit  (cost=13042543.16..13042793.16 rows=100000 width=52)
  ->  Sort  (cost=13029043.16..13129748.57 rows=40282164 width=52)
        Sort Key: id
        ->  Seq Scan on chpl_text t  (cost=0.00..1056505.64 rows=40282164 width=52)

Resulting in very long runtime.

How can I prevent postresql from changeing query plan for higher offsets?

Note: I am aware, that big offsets are not good at all, but I am forced to use them here.

Crazy Yoghurt
  • 2,365
  • 2
  • 26
  • 37
  • 1
    Are you using the default configuration values? You should try to increase the ammount of work_mem and other memory related parameters. – Jakub Kania Mar 26 '14 at 13:26
  • As per postgres DOC (http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-LIMIT) this is expected. See a similar post here http://dba.stackexchange.com/questions/32956/execution-plan-in-simple-query-in-pgsql-depends-on-offset-clause-why. – Rahul Mar 26 '14 at 13:29
  • Can you repeat those with EXPLAIN (ANALYZE,BUFFERS) ? – jjanes Mar 26 '14 at 15:51

1 Answers1

1

If Postgres is configured halfway decently, your statistics are up to date (ANALYZE or autovacuum) and the cost settings are sane, Postgres generally knows better when to do an index scan or a sequential scan. Details and links:
Keep PostgreSQL from sometimes choosing a bad query plan

To actually test performance without sequential scan, "disable" it (in a debug session only!)

SET enable_seqscan=OFF;

More in the manual.

Then run EXPLAIN ANALYZE again ...

Also, the release of Postgres 9.2 had a focus on "big data". With your given use case you should urgently consider upgrading to the current release.

You can also try this alternative query with a CTE and row_number() and see if the query plan turns out more favorable:

WITH cte AS (
   SELECT ..., row_number() OVER (ORDER BY id) AS rn
   FROM   chpl_text
   )
SELECT ...
FROM   cte
WHERE  rn BETWEEN N+1 AND N+100000
ORDER  BY id;

That's not always the case, but might be in your special situation.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228