I have a table with ~4.5 million data for 3 months. I have indexed the "accessedAt" column which stores time in epoch and has datatype BIGINT, When I ran the query postgres did a bit map index scan on 700K rows. in (~48.2s). But when I dropped the index it did a seq scan on 700K rows in (~4s).
[QUERY] explain analyze select id from access_histories where "accessedAt" >= 1631903400 and "accessedAt" <= 1633112999;
i.e from 17 sept 2021 - 01 oct 2021.
Bitmap Heap Scan on access_histories (cost=14655.35..144416.85 rows=715992 width=8) (actual time=238.837..21154.597 rows=715535 loops=1)
Recheck Cond: (("accessedAt" >= 1631903400) AND ("accessedAt" <= 1633112999))
Rows Removed by Index Recheck: 1716759
Heap Blocks: exact=48015 lossy=33133
Buffers: shared hit=3 read=82972 written=8
I/O Timings: read=19860.378 write=4.389
-> Bitmap Index Scan on "access_histories_accessedAt_idx" (cost=0.00..14476.35 rows=715992 width=0) (actual time=229.267..229.272 rows=715535 loops=1)
Index Cond: (("accessedAt" >= 1631903400) AND ("accessedAt" <= 1633112999))
Buffers: shared hit=3 read=1824
I/O Timings: read=62.423
Planning:
Buffers: shared hit=20 read=1
I/O Timings: read=0.012
Planning Time: 1.168 ms
Execution Time: 21198.731 ms
After dropping the index.
Seq Scan on access_histories (cost=0.00..155405.02 rows=715992 width=8) (actual time=0.035..10278.345 rows=715535 loops=1)
Filter: (("accessedAt" >= 1631903400) AND ("accessedAt" <= 1633112999))
Rows Removed by Filter: 4234466
Buffers: shared hit=23814 read=57341
I/O Timings: read=2083.283
Planning:
Buffers: shared hit=76 read=17
I/O Timings: read=7.851
Planning Time: 8.286 ms
Execution Time: 10316.716 ms
(10 rows)
So my question is why is postgres opting to go for index scan although sequential is less expensive.
p.s I know that postgres will go for seq scan if it has to query 5-10% of total data.