I have a very simple SQL:
select * from email.email_task where acquire_time < now() and state IN ('CREATED', 'RELEASED') order by creation_time asc limit 1;
I have 2 indexes created:
- Index of state
- Index of state, acquire_time, creation_time
Ideally I think Postgres should pick the 2nd one since it matches every column required in this SQL:
However the execution plan shows differently, it uses neither of the indexes:
Limit (cost=187404.36..187404.36 rows=1 width=743)
-> Sort (cost=187404.36..190753.58 rows=1339690 width=743)
Sort Key: creation_time
-> Seq Scan on email_task (cost=0.00..180705.91 rows=1339690 width=743)
Filter: (((state)::text = 'CREATED'::text) AND (acquire_time < now()))
I understand that if the number of rows returned arrives like 10% of total, then it would pick Seq Scan over Index Scan. (As explained at Why does PostgreSQL perform sequential scan on indexed column? ) So that's why index1 is not picked.
What I don't understand is why index2 is not picked since matches all the columns?
Then I tried a 3rd index:
- Index of create_time, acquire_time, state
And this time it uses the index3 (I add the index using another smaller database perf_1 because the original one has 2 million rows and it takes too much time)
Limit (cost=0.29..0.36 rows=1 width=75) (actual time=0.043..0.043 rows=1 loops=1)
-> Index Scan using perf_1 on email_task (cost=0.29..763.76 rows=9998 width=75) (actual time=0.042..0.042 rows=1 loops=1)
Index Cond: (acquire_time < now())
Filter: ((state)::text = ANY ('{CREATED,RELEASED}'::text[]))
It seems that, Postgres execution planner is picking the order by clause first then the where clause which is a little bit counter-intuitive.
Is my understanding correct or there are some other factors that impact the Postgres planner?
Thanks in advance.