In a Postgres database, I am querying distinct values of MY_DATE
in a large table with 300 million rows. There are about 400 of them and the column MY_DATE
is indexed.
Select distinct MY_DATE from MY_TABLE;
The query runs for 22 min.
The same query on my Oracle DB with the exact same data-set and the same index definition runs 11 seconds.
The query plan shows that the query is using the index:
EXPLAIN Select distinct MY_DATE from MY_TABLE LIMIT 200;
gives:
QUERY PLAN
Limit (cost=0.57..7171644.14 rows=200 width=8)
-> Unique (cost=0.57..15419034.24 rows=430 width=8)
-> Index Only Scan using idx_obsdate on my_table (cost=0.57..14672064.14 rows=298788038 width=8)
When I limit the results, the query can become much faster. Ee.g.
Select distinct MY_DATE from MY_TABLE LIMIT 5;
runs in sub-seconds.
but:
Select distinct MY_DATE from MY_TABLE LIMIT 50;
already takes minutes. Time seems to increase exponentially with the LIMIT
clause.
I expect the Postgres query to run in seconds, as my OracleDB does. 20 minutes for an index scan - even for a large table - seems way off the mark.
Any suggestions what causes the issue and what I can do?