1

I create a test table in my postgresql db to try to describe the question. table name: test columns: id (varchar, uuid), created_at (timestamp)

I insert some mock data into the table.

postgres=# select count(1) from test;
  count
---------
 1200001
(1 row)

Here is the query A:

postgres=# explain select id from test
postgres-# where id > '67542377-b900-4084-b619-bae9448a73b5'
postgres-# and created_at >= '2020-04-17 08:23:22.229' and created_at < '2020-04-17 08:24:26.474'
postgres-# order by id limit 1000;
                                                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2335.83..2338.33 rows=1000 width=37)
   ->  Sort  (cost=2335.83..2381.98 rows=18460 width=37)
         Sort Key: id
         ->  Index Scan using index_create_at on test  (cost=0.43..1323.69 rows=18460 width=37)
               Index Cond: ((created_at >= '2020-04-17 08:23:22.229'::timestamp without time zone) AND (created_at < '2020-04-17 08:24:26.474'::timestamp without time zone))
               Filter: ((id)::text > '67542377-b900-4084-b619-bae9448a73b5'::text)
(6 rows)

Then is the query B:

postgres=# explain select id from test
postgres-# where id > '67542377-b900-4084-b619-bae9448a73b5'
postgres-# and created_at >= '2020-04-17 08:23:22.229' and created_at < '2020-04-17 08:40:26.474'
postgres-# order by id limit 1000;
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..122.60 rows=1000 width=37)
   ->  Index Scan using test_pkey on test  (cost=0.43..87966.38 rows=720001 width=37)
         Index Cond: ((id)::text > '67542377-b900-4084-b619-bae9448a73b5'::text)
         Filter: ((created_at >= '2020-04-17 08:23:22.229'::timestamp without time zone) AND (created_at < '2020-04-17 08:40:26.474'::timestamp without time zone))
(4 rows)

I noticed the execution plan are different due to the different created_at values. In this example, the postgres optimizer seems make the right decision (this cost is low).

but in the production database, there are billions of data. then the query B will be very slow.

I tried write the query B in different ways to try to change the execution index order but failed. the plan doesn't change.

postgres=# explain select id from
postgres-# (select id from test where created_at >= '2020-04-17 08:23:22.229' and created_at < '2020-04-17 08:40:26.474') t1
postgres-# where id > '67542377-b900-4084-b619-bae9448a73b5'
postgres-# order by id limit 1000;
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..122.60 rows=1000 width=37)
   ->  Index Scan using test_pkey on test  (cost=0.43..87966.38 rows=720001 width=37)
         Index Cond: ((id)::text > '67542377-b900-4084-b619-bae9448a73b5'::text)
         Filter: ((created_at >= '2020-04-17 08:23:22.229'::timestamp without time zone) AND (created_at < '2020-04-17 08:40:26.474'::timestamp without time zone))
(4 rows)

Below is the query plan from production:

explain select id from t_test_log
where id > '51666ee5ca624653b745a11635c436af'
and created_at >= '2019-02-02T00:00.000Z' and created_at < '2019-02-02T01:00.000Z'
order by id limit 1000;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=228.45..230.93 rows=991 width=33)
   ->  Sort  (cost=228.45..230.93 rows=991 width=33)
         Sort Key: id
         ->  Index Scan using idx_test_log_create_at on t_test_log  (cost=0.57..179.13 rows=991 width=33)
               Index Cond: ((created_at >= '2019-02-02 00:00:00'::timestamp without time zone) AND (created_at < '2019-02-02 00:01:00'::timestamp without time zone))
               Filter: ((id)::text > '51666ee5ca624653b745a11635c436af'::text)
(6 rows)
explain select id from t_test_log
where id > '51666ee5ca624653b745a11635c436af'
and created_at >= '2019-02-02T00:00.000Z' and created_at < '2019-02-03T01:00.000Z'
order by id limit 1000;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..39262.61 rows=1000 width=33)
   ->  Index Scan using pk_test_log_main on t_test_log  (cost=0.70..52354159.88 rows=1333459 width=33)
         Index Cond: ((id)::text > '51666ee5ca624653b745a11635c436af'::text)
         Filter: ((created_at >= '2019-02-02 00:00:00'::timestamp without time zone) AND (created_at < '2019-02-03 00:01:00'::timestamp without time zone))
(4 rows)

So is it possible to change the query B plan to force it use the plan same as query A ?

yukuan
  • 521
  • 5
  • 18
  • How do you know that the second query will be slow in production? Did you actually test it? What was the plan Postgres used? –  Apr 17 '20 at 10:07
  • Yes, I tested it in production. if the condition is in a small date range, it will use query A plan, if it is in a big date range, it will use query B plan and it will execute slowly. – yukuan Apr 17 '20 at 10:19
  • 1
    Then please **[edit]** your question and add the execution plans from production generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain). Including the actual `create table` and `create index` statement will also help –  Apr 17 '20 at 10:21
  • "the postgres optimizer seems make the right decision (this cost is low)." This is almost a tautology. It chooses the plan with the lowest estimated cost (more or less, it is very complicated and uses some heuristics to shortcut things), so of course the estimated cost is low. But the estimated cost may be delusional, you haven't shown them to be correct by executing the plan. – jjanes Apr 17 '20 at 14:21

1 Answers1

1

In general you cannot force a query to use a specific index but you can influence the query planner with better statistics, rewriting the query or with some parameters.

See related discussions: How do I force Postgres to use a particular index? Postgres query optimization (forcing an index scan)

pifor
  • 7,419
  • 2
  • 8
  • 16