I have a table in PostgreSQL 9.2 that has a text
column. Let's call this text_col
. The values in this column are fairly unique (may contain 5-6 duplicates at the most). The table has ~5 million rows. About half these rows contain a null
value for text_col
. When I execute the following query I expect 1-5 rows. In most cases (>80%) I only expect 1 row.
Query
explain analyze SELECT col1,col2.. colN
FROM table
WHERE text_col = 'my_value';
A btree
index exists on text_col
. This index is never used by the query planner and I am not sure why. This is the output of the query.
Planner
Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384
I added another partial index to try to filter out those values that were not null, but that did not help (with or without text_pattern_ops
. I do not need text_pattern_ops
considering no LIKE
conditions are expressed in my queries, but they also match equality).
CREATE INDEX name_idx
ON table
USING btree
(text_col COLLATE pg_catalog."default" text_pattern_ops)
WHERE text_col IS NOT NULL;
Disabling sequence scans using set enable_seqscan = off;
makes the planner still pick the seqscan
over an index_scan
. In summary...
- The number of rows returned by this query is small.
- Given that the non-null rows are fairly unique, an index scan over the text should be faster.
- Vacuuming and analyzing the table did not help the optimizer pick the index.
My questions
- Why does the database pick the sequence scan over the index scan?
- When a table has a text column whose equality condition should be checked, are there any best practices I can adhere to?
- How do I reduce the time taken for this query?
[Edit - More information]
- The index scan is picked up on my local database that houses about 10% of the data that is available in production.