I'm running a query to determine the number of points in a polygon, for each polygon in a table. the point layer (twitter) has 500m rows and the polygon layer (us_urbanareas) has ~3000 rows. The point layer's geometry is indexed.
the query looks like:
EXPLAIN SELECT us_urbanareas.name, us_urbanareas.gid, count(twitter.coords) AS total
FROM us_urbanareas LEFT JOIN twitter
ON st_contains(ST_Transform(us_urbanareas.geom,4326),twitter.coords)
GROUP BY us_urbanareas.gid LIMIT 500
and the explain looks like:
"Limit (cost=1265.59..47875481.71 rows=500 width=45)"
" -> GroupAggregate (cost=1265.59..342780653.01 rows=3580 width=45)"
" -> Nested Loop Left Join (cost=1265.59..340247956.29 rows=506532183 width=45)"
" -> Index Scan using us_urbanareas_pkey on us_urbanareas (cost=0.28..1000.18 rows=3580 width=4028)"
" -> Bitmap Heap Scan on twitter (cost=1265.31..94899.56 rows=14149 width=32)"
" Recheck Cond: (st_transform(us_urbanareas.geom, 4326) && coords)"
" Filter: _st_contains(st_transform(us_urbanareas.geom, 4326), coords)"
" -> Bitmap Index Scan on coord_gist (cost=0.00..1261.77 rows=42447 width=0)"
" Index Cond: (st_transform(us_urbanareas.geom, 4326) && coords)"
The query works, but is fairly slow. the error message i receive is:
NOTICE: gserialized_gist_joinsel: jointype 1 not supported
The db server that I'm working on has loads of memory (96gb), and we've just been through a process of optimizing its configuration to make the most of it. I'm running PostGres 9.3.2
Any thoughts on how to optimize the query and get around the jointype not supported?
Update:
updated the SRID's of the us_urbanareas table to remove the transform 2. removed the left join and re-wrote query as per suggestion below 3. tried turning bitmapscan = off
turning off the bitmapscan drastically reduces the cost in the explain, but increases the query time (tested using a few random polys) twofold.
"Limit (cost=0.83..50601686.53 rows=500 width=45)"
" -> GroupAggregate (cost=0.83..362308070.41 rows=3580 width=45)"
" -> Nested Loop (cost=0.83..359737718.66 rows=514063189 width=45)"
" -> Index Scan using us_urbanareas_pkey on us_urbanareas us (cost=0.28..1002.76 rows=3580 width=45)"
" -> Index Scan using coord_gist on twitter tw (cost=0.55..100341.53 rows=14359 width=32)"
" Index Cond: (us.geom && coords)"
" Filter: _st_contains(us.geom, coords)"
update 4. executing
select * from pg_stat_activity;
results with,
"INSERT INTO twitter(tweet,name,handle,location,coords,time) VALUES ($$T :($$,$$anissa aguilar$$,$$babyniss_$$,$$Abq/Rio, NM $$,ST_GeomFromText('POINT(-106.659914 35.23192)', (...)"
"SELECT version();"
"SELECT t.oid, t.xmin, t.*, relname, CASE WHEN relkind = 'r' THEN TRUE ELSE FALSE END AS parentistable, nspname, des.description, l.lanname, p.prosrc,
substring(pg_get_triggerdef(t.oid), 'WHEN (.*) EXECUTE PROCEDURE') AS whenclause
FROM pg_trigger t
(...)"
"select * from pg_stat_activity;"
" SELECT us.name, us.gid, count(tw.coords) AS total
FROM us_urbanareas us, twitter tw
WHERE st_contains(us.geom, tw.coords)
GROUP BY us.gid;"