When selecting MIN
on a column in PostgreSQL (11, 12, 13) after a GROUP BY
operation on multiple columns, any index created on the grouped columns is not used: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=30e0f341940f4c1fa6013677643a0baf
CREATE TABLE tags (id serial, series int, index int, page int);
CREATE INDEX ON tags (page, series, index);
INSERT INTO tags (series, index, page)
SELECT
ceil(random() * 10),
ceil(random() * 100),
ceil(random() * 1000)
FROM generate_series(1, 100000);
EXPLAIN ANALYZE
SELECT tags.page, tags.series, MIN(tags.index)
FROM tags GROUP BY tags.page, tags.series;
HashAggregate (cost=2291.00..2391.00 rows=10000 width=12) (actual time=108.968..133.153 rows=9999 loops=1)
Group Key: page, series
Batches: 1 Memory Usage: 1425kB
-> Seq Scan on tags (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.015..55.240 rows=100000 loops=1)
Planning Time: 0.257 ms
Execution Time: 133.771 ms
Theoretically, the index should allow the database to seek in steps of (tags.page, tags.series)
instead of performing a full scan. This would result in 10,000 processed rows for above dataset instead of 100,000. This link describes the method with no grouped columns.
This answer (as well as this one) suggests using DISTINCT ON
with an ordering instead of GROUP BY
but that produces this query plan:
Unique (cost=0.42..5680.42 rows=10000 width=12) (actual time=0.066..268.038 rows=9999 loops=1)
-> Index Only Scan using tags_page_series_index_idx on tags (cost=0.42..5180.42 rows=100000 width=12) (actual time=0.064..227.219 rows=100000 loops=1)
Heap Fetches: 100000
Planning Time: 0.426 ms
Execution Time: 268.712 ms
While the index is now being used, it still appears to be scanning the full set of rows. When using SET enable_seqscan=OFF
, the GROUP BY
query degrades to the same behaviour.
How can I encourage PostgreSQL to use the multi-column index?