I'm having a table (demo
) with a sequence as its primary key (seqno
) and a geometry
property contained within a JSONB column (doc
). I have configured a primary key constraint for the sequence column and a GiST index for the geometry. I have already gathered statistics by running VACUUM ANALYZE
. It's a fairly large table (42M rows).
CREATE TABLE demo
(
seqno bigint NOT NULL DEFAULT nextval('seqno'::regclass),
doc jsonb NOT NULL DEFAULT '{}'::jsonb,
CONSTRAINT demo_pkey PRIMARY KEY (seqno)
)
CREATE INDEX demo_doc_geometry_gist
ON demo USING gist (st_geometryfromtext(doc ->> 'geometry'::text))
I want to perform a spatial filter on a rather large area and return the first 10 rows, sorted by its primary key. Therefore, I have tried the following query:
SELECT seqno, doc
FROM demo
WHERE ST_Within(ST_GeometryFromText((doc->>'geometry')), ST_GeometryFromText('POLYGON((4.478054829251019 52.61266886732067,5.247097798001019 52.61266886732067,5.247097798001019 52.156694555984416,4.478054829251019 52.156694555984416,4.478054829251019 52.61266886732067))'))
ORDER BY seqno
LIMIT 10
This results in the following query plan:
Limit (cost=1000.59..15169.06 rows=10 width=633) (actual time=2479.372..2496.737 rows=10 loops=1)
-> Gather Merge (cost=1000.59..19780184.81 rows=13960 width=633) (actual time=2479.370..2496.732 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Scan using demo_pkey on demo (cost=0.56..19777573.45 rows=5817 width=633) (actual time=2440.310..2450.101 rows=5 loops=3)
Filter: (('0103000020407100000100000005000000CFCA3EB32997F4402D3225A6F0D02041DDFD612B4A5F0141D66C69E40CCD20415E0E6F193D580141AE7BECF122511C412C99A20E8F48F440E6B3764403591C41CFCA3EB32997F4402D3225A6F0D02041'::geometry ~ st_geometryfromtext((doc ->> 'geometry'::text))) AND _st_contains('0103000020407100000100000005000000CFCA3EB32997F4402D3225A6F0D02041DDFD612B4A5F0141D66C69E40CCD20415E0E6F193D580141AE7BECF122511C412C99A20E8F48F440E6B3764403591C41CFCA3EB32997F4402D3225A6F0D02041'::geometry, st_geometryfromtext((doc ->> 'geometry'::text))))
Rows Removed by Filter: 221313
Planning Time: 0.375 ms
Execution Time: 2496.786 ms
This shows that the primary key constraint index is used to scan all rows and perform the spatial filter on each row, which is obviously very inefficient. There are more than 5M matches for the given spatial predicate. The GiST index is not used at all.
However, when leaving out the ORDER BY clause, the GiST index for the geometry property is properly used, which is far more efficient.
Limit (cost=0.42..128.90 rows=10 width=633) (actual time=0.381..0.745 rows=10 loops=1)
-> Index Scan using demo_doc_geometry_gist on demo (cost=0.42..179352.99 rows=13960 width=633) (actual time=0.380..0.742 rows=10 loops=1)
Index Cond: ('0103000020407100000100000005000000CFCA3EB32997F4402D3225A6F0D02041DDFD612B4A5F0141D66C69E40CCD20415E0E6F193D580141AE7BECF122511C412C99A20E8F48F440E6B3764403591C41CFCA3EB32997F4402D3225A6F0D02041'::geometry ~ st_geometryfromtext((doc ->> 'geometry'::text)))
Filter: _st_contains('0103000020407100000100000005000000CFCA3EB32997F4402D3225A6F0D02041DDFD612B4A5F0141D66C69E40CCD20415E0E6F193D580141AE7BECF122511C412C99A20E8F48F440E6B3764403591C41CFCA3EB32997F4402D3225A6F0D02041'::geometry, st_geometryfromtext((doc ->> 'geometry'::text)))
Planning Time: 0.245 ms
Execution Time: 0.780 ms
Is there a way to make this query fast? Can we let the query planner combine the GiST index with the PK index to get a sorted result? Any other suggestions?