I have a performance issue with Postgres. I have two tables with the same structure, same indexes and I also executed the same CLUSTER on id_coordinate index on both tables. The tables have the following structure:
Column | Type | Modifiers | Storage | Description
----------------+----------+-------------------------------------------+---------+-------------
id_best_server | integer | not null default nextval('seq'::regclass) | plain |
date | date | not null | plain |
id_coordinate | integer | not null | plain |
mnc | smallint | | plain |
id_cell | integer | | plain |
rx_level | real | | plain |
rx_quality | real | | plain |
sqi | real | | plain |
Indexes:
"history_best_server_until_2013_10_pkey" PRIMARY KEY, btree (id_best_server)
"ix_history_best_server_until_2013_10_id_coordinate" btree (id_coordinate) CLUSTER
"ix_history_best_server_until_2013_10_id_best_server" btree (id_best_server)
Query executed:
EXPLAIN ANALYZE SELECT DISTINCT ON (x, y) x, y, rx_level, rx_quality, date, mnc, id_cell
FROM
(
SELECT X(co.location) AS x, Y(co.location) AS y, tems.rx_level, tems.rx_quality, date, mnc, id_cell
FROM tems.history_best_server_until_2012_10 AS tems
JOIN gis.coordinate AS co ON tems.id_coordinate = co.id_coordinate
AND co.location && setsrid(makeBox2d(GeomFromText('POINT(101000 461500)', 2710),
GeomFromText('POINT(102400 463610)', 2710)
), 2710)
WHERE mnc = 41
) AS j1
ORDER BY x, y, date DESC
Both tables have almost the same number of rows (around 8M). When I execute the query above, on one table i get these results:
"Unique (cost=245742.87..245805.99 rows=8416 width=118) (actual time=3420.966..3425.584 rows=10009 loops=1)"
" -> Sort (cost=245742.87..245763.91 rows=8416 width=118) (actual time=3420.963..3422.236 rows=10212 loops=1)"
" Sort Key: (x(co.location)), (y(co.location)), tems.date"
" Sort Method: quicksort Memory: 1182kB"
" -> Hash Join (cost=61069.15..245194.20 rows=8416 width=118) (actual time=191.365..3405.590 rows=10212 loops=1)"
" Hash Cond: (tems.id_coordinate = co.id_coordinate)"
" -> Seq Scan on history_best_server_until_2012_10 tems (cost=0.00..147705.35 rows=3226085 width=22) (actual time=0.009..1749.468 rows=3230507 loops=1)"
" Filter: (mnc = 41)"
" -> Hash (cost=60697.73..60697.73 rows=29714 width=104) (actual time=46.828..46.828 rows=31806 loops=1)"
" Buckets: 4096 Batches: 1 Memory Usage: 1864kB"
" -> Bitmap Heap Scan on coordinate co (cost=937.22..60697.73 rows=29714 width=104) (actual time=14.975..35.561 rows=31806 loops=1)"
" Recheck Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::geome (...)"
" -> Bitmap Index Scan on ix_coordinate_location (cost=0.00..929.79 rows=29714 width=0) (actual time=14.593..14.593 rows=31806 loops=1)"
" Index Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::g (...)"
"Total runtime: 3426.635 ms"
On the other table, it looks like this:
"Unique (cost=267070.35..267138.75 rows=9120 width=118) (actual time=172.333..177.232 rows=10051 loops=1)"
" -> Sort (cost=267070.35..267093.15 rows=9120 width=118) (actual time=172.330..173.708 rows=10256 loops=1)"
" Sort Key: (x(co.location)), (y(co.location)), tems.date"
" Sort Method: quicksort Memory: 1186kB"
" -> Nested Loop (cost=937.22..266470.49 rows=9120 width=118) (actual time=14.876..156.322 rows=10256 loops=1)"
" -> Bitmap Heap Scan on coordinate co (cost=937.22..60697.73 rows=29714 width=104) (actual time=14.788..29.510 rows=31806 loops=1)"
" Recheck Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::geometry)"
" -> Bitmap Index Scan on ix_coordinate_location (cost=0.00..929.79 rows=29714 width=0) (actual time=14.409..14.409 rows=31806 loops=1)"
" Index Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::geometr (...)"
" -> Index Scan using ix_history_best_server_until_2013_10_id_coordinate on history_best_server_until_2013_10 tems (cost=0.00..6.91 rows=1 width=22) (actual time=0.003..0.003 rows=0 loops=31806)"
" Index Cond: (id_coordinate = co.id_coordinate)"
" Filter: (mnc = 41)"
"Total runtime: 178.280 ms"
Total runtime is different.
If "WHERE mnc = 41" is not used, they both work quick. I have no idea what causes the sequence scan in first case. Note that mnc can have one of only 3 possible values. Frequences of each value are about 41%, 39%, 20% on faster table and 43%, 41%, 16% on slower table.
ADDED: This is the statistics for fast table.
tablename | attname | n_distinct | correlation | most_common_freqs
-----------------------------------+----------------+------------+-------------+-------------------
history_best_server_until_2013_10 | id_best_server | -1 | 1 |
history_best_server_until_2013_10 | date | 1122 | -0.206991 | many values
history_best_server_until_2013_10 | id_coordinate | -0.373645 | 1 | many values
history_best_server_until_2013_10 | mnc | 3 | 0.30477 | {0.411783,0.386967,0.20125}
history_best_server_until_2013_10 | id_cell | 5811 | -0.0759416 | many values
history_best_server_until_2013_10 | rx_level | 14961 | -0.122292 | many values
history_best_server_until_2013_10 | rx_quality | 16 | 0.360472 | many values
history_best_server_until_2013_10 | sqi | 5552 | 0.212023 | many values
(8 rows)
This one is for the slow one:
tablename | attname | n_distinct | correlation | most_common_freqs
-----------------------------------+----------------+------------+-------------+-------------------
history_best_server_until_2012_10 | id_best_server | -1 | 1 |
history_best_server_until_2012_10 | date | 954 | -0.205897 | many values
history_best_server_until_2012_10 | id_coordinate | -0.421911 | 1 | many values
history_best_server_until_2012_10 | mnc | 3 | 0.314319 | {0.4349,0.402433,0.162667}
history_best_server_until_2012_10 | id_cell | 5617 | -0.0715787 | many values
history_best_server_until_2012_10 | rx_level | 14129 | -0.115288 | many values
history_best_server_until_2012_10 | rx_quality | 22 | 0.368943 | many values
history_best_server_until_2012_10 | sqi | 5320 | 0.226596 | many values
Table definition for gis.coordinate
Table "gis.coordinate"
Column | Type | Modifiers | Storage | Description
---------------+----------+------------------------------------------------------------------------+---------+-------------
id_coordinate | integer | not null default nextval('gis.coordinate_id_coordinate_seq'::regclass) | plain |
location | geometry | | main |
Indexes:
"coordinate_pkey" PRIMARY KEY, btree (id_coordinate)
"ix_pk_coordinate" UNIQUE, btree (id_coordinate) CLUSTER
"ix_coordinate_location" gist (location)
Check constraints:
"enforce_dims_location" CHECK (ndims(location) = 2)
"enforce_geotype_location" CHECK (geometrytype(location) = 'POINT'::text OR location IS NULL)
"enforce_srid_location" CHECK (srid(location) = 2710)