I create a test table in my postgresql db to try to describe the question. table name: test columns: id (varchar, uuid), created_at (timestamp)
I insert some mock data into the table.
postgres=# select count(1) from test;
count
---------
1200001
(1 row)
Here is the query A:
postgres=# explain select id from test
postgres-# where id > '67542377-b900-4084-b619-bae9448a73b5'
postgres-# and created_at >= '2020-04-17 08:23:22.229' and created_at < '2020-04-17 08:24:26.474'
postgres-# order by id limit 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2335.83..2338.33 rows=1000 width=37)
-> Sort (cost=2335.83..2381.98 rows=18460 width=37)
Sort Key: id
-> Index Scan using index_create_at on test (cost=0.43..1323.69 rows=18460 width=37)
Index Cond: ((created_at >= '2020-04-17 08:23:22.229'::timestamp without time zone) AND (created_at < '2020-04-17 08:24:26.474'::timestamp without time zone))
Filter: ((id)::text > '67542377-b900-4084-b619-bae9448a73b5'::text)
(6 rows)
Then is the query B:
postgres=# explain select id from test
postgres-# where id > '67542377-b900-4084-b619-bae9448a73b5'
postgres-# and created_at >= '2020-04-17 08:23:22.229' and created_at < '2020-04-17 08:40:26.474'
postgres-# order by id limit 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..122.60 rows=1000 width=37)
-> Index Scan using test_pkey on test (cost=0.43..87966.38 rows=720001 width=37)
Index Cond: ((id)::text > '67542377-b900-4084-b619-bae9448a73b5'::text)
Filter: ((created_at >= '2020-04-17 08:23:22.229'::timestamp without time zone) AND (created_at < '2020-04-17 08:40:26.474'::timestamp without time zone))
(4 rows)
I noticed the execution plan are different due to the different created_at values. In this example, the postgres optimizer seems make the right decision (this cost is low).
but in the production database, there are billions of data. then the query B will be very slow.
I tried write the query B in different ways to try to change the execution index order but failed. the plan doesn't change.
postgres=# explain select id from
postgres-# (select id from test where created_at >= '2020-04-17 08:23:22.229' and created_at < '2020-04-17 08:40:26.474') t1
postgres-# where id > '67542377-b900-4084-b619-bae9448a73b5'
postgres-# order by id limit 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..122.60 rows=1000 width=37)
-> Index Scan using test_pkey on test (cost=0.43..87966.38 rows=720001 width=37)
Index Cond: ((id)::text > '67542377-b900-4084-b619-bae9448a73b5'::text)
Filter: ((created_at >= '2020-04-17 08:23:22.229'::timestamp without time zone) AND (created_at < '2020-04-17 08:40:26.474'::timestamp without time zone))
(4 rows)
Below is the query plan from production:
explain select id from t_test_log
where id > '51666ee5ca624653b745a11635c436af'
and created_at >= '2019-02-02T00:00.000Z' and created_at < '2019-02-02T01:00.000Z'
order by id limit 1000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=228.45..230.93 rows=991 width=33)
-> Sort (cost=228.45..230.93 rows=991 width=33)
Sort Key: id
-> Index Scan using idx_test_log_create_at on t_test_log (cost=0.57..179.13 rows=991 width=33)
Index Cond: ((created_at >= '2019-02-02 00:00:00'::timestamp without time zone) AND (created_at < '2019-02-02 00:01:00'::timestamp without time zone))
Filter: ((id)::text > '51666ee5ca624653b745a11635c436af'::text)
(6 rows)
explain select id from t_test_log
where id > '51666ee5ca624653b745a11635c436af'
and created_at >= '2019-02-02T00:00.000Z' and created_at < '2019-02-03T01:00.000Z'
order by id limit 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..39262.61 rows=1000 width=33)
-> Index Scan using pk_test_log_main on t_test_log (cost=0.70..52354159.88 rows=1333459 width=33)
Index Cond: ((id)::text > '51666ee5ca624653b745a11635c436af'::text)
Filter: ((created_at >= '2019-02-02 00:00:00'::timestamp without time zone) AND (created_at < '2019-02-03 00:01:00'::timestamp without time zone))
(4 rows)
So is it possible to change the query B plan to force it use the plan same as query A ?