I am facing a strange and infuriating problem of my PostgreSQL 10 database not using indexes in specific situations. I have created an index of an md5 hash of one of the columns, and this index seems to work fine under normal circumstances:
EXPLAIN
SELECT * FROM variants_table WHERE md5(variant_id) = md5('1_9746683_A_G');
QUERY PLAN
Index Scan using idx_variant_id_md5 on variants_table (cost=0.71..50470.36 rows=12473 width=748)
Index Cond: (md5(variant_id) = '00fa85f0ea97a4aa2898f838f427e560'::text)
(2 rows)
However, when I try to delete duplicates using this query:
EXPLAIN
WITH Ranked AS (
SELECT
md5(variant_id) as variant_id_md5,
consscore,
ROW_NUMBER() OVER (PARTITION BY md5(variant_id) ORDER BY consscore DESC) as rn
FROM
variants_table
)
DELETE FROM variants_table
WHERE (md5(variant_id), consscore) IN (
SELECT variant_id_md5, consscore
FROM Ranked
WHERE rn > 1
);
QUERY PLAN
Delete on variants_table (cost=3647060725.61..4520666023.49 rows=2949470436 width=70)
CTE ranked
-> WindowAgg (cost=3096491910.93..3361944250.15 rows=11797881743 width=48)
-> Sort (cost=3096491910.93..3125986615.29 rows=11797881743 width=40)
Sort Key: (md5(variants_table_1.variant_id)), variants_table_1.consscore DESC
-> Seq Scan on variants_table variants_table_1 (cost=0.00..800237220.79 rows=11797881743 width=40)
-> Hash Join (cost=285116475.46..1158721773.34 rows=2949470436 width=70)
Hash Cond: ((md5(variants_table.variant_id) = ranked.variant_id_md5) AND (variants_table.consscore = ranked.consscore))
-> Seq Scan on variants_table (cost=0.00..770742516.43 rows=11797881743 width=46)
-> Hash (cost=285115875.46..285115875.46 rows=40000 width=104)
-> HashAggregate (cost=285115475.46..285115875.46 rows=40000 width=104)
Group Key: ranked.variant_id_md5, ranked.consscore
-> CTE Scan on ranked (cost=0.00..265452339.22 rows=3932627248 width=104)
Filter: (rn > 1)
(14 rows)
It defaults to a sequential scan.
I have tried everything to debug this including setting enable_seqscan = off
, but no matter what, the query planner still uses the sequential scan, leading me to assume that it doesn't know how to use the indexes.