1

I have a performance issue with Postgres. I have two tables with the same structure, same indexes and I also executed the same CLUSTER on id_coordinate index on both tables. The tables have the following structure:

     Column     |   Type   |                  Modifiers                | Storage | Description
----------------+----------+-------------------------------------------+---------+-------------
 id_best_server | integer  | not null default nextval('seq'::regclass) | plain   |
 date           | date     | not null                                  | plain   |
 id_coordinate  | integer  | not null                                  | plain   |
 mnc            | smallint |                                           | plain   |
 id_cell        | integer  |                                           | plain   |
 rx_level       | real     |                                           | plain   |
 rx_quality     | real     |                                           | plain   |
 sqi            | real     |                                           | plain   |

Indexes:
    "history_best_server_until_2013_10_pkey" PRIMARY KEY, btree (id_best_server)
    "ix_history_best_server_until_2013_10_id_coordinate" btree (id_coordinate) CLUSTER
    "ix_history_best_server_until_2013_10_id_best_server" btree (id_best_server)

Query executed:

EXPLAIN ANALYZE SELECT DISTINCT ON (x, y) x, y, rx_level, rx_quality, date, mnc, id_cell
FROM
(
    SELECT X(co.location) AS x, Y(co.location) AS y, tems.rx_level, tems.rx_quality, date, mnc, id_cell
    FROM tems.history_best_server_until_2012_10 AS tems
    JOIN gis.coordinate AS co ON tems.id_coordinate = co.id_coordinate
        AND co.location && setsrid(makeBox2d(GeomFromText('POINT(101000 461500)', 2710),
                             GeomFromText('POINT(102400 463610)', 2710)
                             ), 2710)
    WHERE mnc = 41
) AS j1
ORDER BY x, y, date DESC

Both tables have almost the same number of rows (around 8M). When I execute the query above, on one table i get these results:

"Unique  (cost=245742.87..245805.99 rows=8416 width=118) (actual time=3420.966..3425.584 rows=10009 loops=1)"
"  ->  Sort  (cost=245742.87..245763.91 rows=8416 width=118) (actual time=3420.963..3422.236 rows=10212 loops=1)"
"        Sort Key: (x(co.location)), (y(co.location)), tems.date"
"        Sort Method: quicksort  Memory: 1182kB"
"        ->  Hash Join  (cost=61069.15..245194.20 rows=8416 width=118) (actual time=191.365..3405.590 rows=10212 loops=1)"
"              Hash Cond: (tems.id_coordinate = co.id_coordinate)"
"              ->  Seq Scan on history_best_server_until_2012_10 tems  (cost=0.00..147705.35 rows=3226085 width=22) (actual time=0.009..1749.468 rows=3230507 loops=1)"
"                    Filter: (mnc = 41)"
"              ->  Hash  (cost=60697.73..60697.73 rows=29714 width=104) (actual time=46.828..46.828 rows=31806 loops=1)"
"                    Buckets: 4096  Batches: 1  Memory Usage: 1864kB"
"                    ->  Bitmap Heap Scan on coordinate co  (cost=937.22..60697.73 rows=29714 width=104) (actual time=14.975..35.561 rows=31806 loops=1)"
"                          Recheck Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::geome (...)"
"                          ->  Bitmap Index Scan on ix_coordinate_location  (cost=0.00..929.79 rows=29714 width=0) (actual time=14.593..14.593 rows=31806 loops=1)"
"                                Index Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::g (...)"
"Total runtime: 3426.635 ms"

On the other table, it looks like this:

"Unique  (cost=267070.35..267138.75 rows=9120 width=118) (actual time=172.333..177.232 rows=10051 loops=1)"
"  ->  Sort  (cost=267070.35..267093.15 rows=9120 width=118) (actual time=172.330..173.708 rows=10256 loops=1)"
"        Sort Key: (x(co.location)), (y(co.location)), tems.date"
"        Sort Method: quicksort  Memory: 1186kB"
"        ->  Nested Loop  (cost=937.22..266470.49 rows=9120 width=118) (actual time=14.876..156.322 rows=10256 loops=1)"
"              ->  Bitmap Heap Scan on coordinate co  (cost=937.22..60697.73 rows=29714 width=104) (actual time=14.788..29.510 rows=31806 loops=1)"
"                    Recheck Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::geometry)"
"                    ->  Bitmap Index Scan on ix_coordinate_location  (cost=0.00..929.79 rows=29714 width=0) (actual time=14.409..14.409 rows=31806 loops=1)"
"                          Index Cond: (location && '0103000020960A000001000000050000000000000080A8F84000000000F02A1C410000000080A8F84000000000E84B1C41000000000000F94000000000E84B1C41000000000000F94000000000F02A1C410000000080A8F84000000000F02A1C41'::geometr (...)"
"              ->  Index Scan using ix_history_best_server_until_2013_10_id_coordinate on history_best_server_until_2013_10 tems  (cost=0.00..6.91 rows=1 width=22) (actual time=0.003..0.003 rows=0 loops=31806)"
"                    Index Cond: (id_coordinate = co.id_coordinate)"
"                    Filter: (mnc = 41)"
"Total runtime: 178.280 ms"

Total runtime is different.

If "WHERE mnc = 41" is not used, they both work quick. I have no idea what causes the sequence scan in first case. Note that mnc can have one of only 3 possible values. Frequences of each value are about 41%, 39%, 20% on faster table and 43%, 41%, 16% on slower table.

ADDED: This is the statistics for fast table.

             tablename             |    attname     | n_distinct | correlation | most_common_freqs
-----------------------------------+----------------+------------+-------------+-------------------
 history_best_server_until_2013_10 | id_best_server |         -1 |           1 |
 history_best_server_until_2013_10 | date           |       1122 |   -0.206991 | many values
 history_best_server_until_2013_10 | id_coordinate  |  -0.373645 |           1 | many values
 history_best_server_until_2013_10 | mnc            |          3 |     0.30477 | {0.411783,0.386967,0.20125}
 history_best_server_until_2013_10 | id_cell        |       5811 |  -0.0759416 | many values
 history_best_server_until_2013_10 | rx_level       |      14961 |   -0.122292 | many values
 history_best_server_until_2013_10 | rx_quality     |         16 |    0.360472 | many values
 history_best_server_until_2013_10 | sqi            |       5552 |    0.212023 | many values
(8 rows)

This one is for the slow one:

             tablename             |    attname     | n_distinct | correlation | most_common_freqs
-----------------------------------+----------------+------------+-------------+-------------------
 history_best_server_until_2012_10 | id_best_server |         -1 |           1 |
 history_best_server_until_2012_10 | date           |        954 |   -0.205897 | many values
 history_best_server_until_2012_10 | id_coordinate  |  -0.421911 |           1 | many values
 history_best_server_until_2012_10 | mnc            |          3 |    0.314319 | {0.4349,0.402433,0.162667}
 history_best_server_until_2012_10 | id_cell        |       5617 |  -0.0715787 | many values
 history_best_server_until_2012_10 | rx_level       |      14129 |   -0.115288 | many values
 history_best_server_until_2012_10 | rx_quality     |         22 |    0.368943 | many values
 history_best_server_until_2012_10 | sqi            |       5320 |    0.226596 | many values

Table definition for gis.coordinate

                                                  Table "gis.coordinate"
    Column     |   Type   |                               Modifiers                                | Storage | Description
---------------+----------+------------------------------------------------------------------------+---------+-------------
 id_coordinate | integer  | not null default nextval('gis.coordinate_id_coordinate_seq'::regclass) | plain   |
 location      | geometry |                                                                        | main    |

Indexes:
    "coordinate_pkey" PRIMARY KEY, btree (id_coordinate)
    "ix_pk_coordinate" UNIQUE, btree (id_coordinate) CLUSTER
    "ix_coordinate_location" gist (location)

Check constraints:
    "enforce_dims_location" CHECK (ndims(location) = 2)
    "enforce_geotype_location" CHECK (geometrytype(location) = 'POINT'::text OR location IS NULL)
    "enforce_srid_location" CHECK (srid(location) = 2710)
MlinarG
  • 41
  • 3
  • I would expect that the slower one is doing a conversion of type when you add the where clause. – bowlturner Dec 11 '14 at 15:34
  • Nope, that's not the cause. I changed mnc type to integer on both tables and the problem is still the same. Select on one table works fast, on the other it is still slow. – MlinarG Dec 11 '14 at 15:56
  • are the stats up to date? If it's not doing a conversion then it's just not using the right indexes. Not familiar with postgresql but most let you force the use of an index, so maybe either update the stats on the lagging table or try forcing the index to see if they fix the time issue. – bowlturner Dec 11 '14 at 16:00
  • Can you also provide the table definition for `gis.coordinate`? And (as *always*) your version of Postgres. – Erwin Brandstetter Dec 11 '14 at 16:44
  • I updated the question with table definition for gis.coordinate and stats table. My version of postgres is 9.1.13. – MlinarG Dec 11 '14 at 16:47
  • What do you get for `SELECT ((count(mnc = 41 OR NULL) * 100.0)/ count(*))::int FROM tbl;` in each table? – Erwin Brandstetter Dec 11 '14 at 16:55
  • 43% for slow table, 41% for fast one. – MlinarG Dec 11 '14 at 17:01
  • Ah, I see 8M rows for each history table. And the cardinality for the coordinate table? – Erwin Brandstetter Dec 11 '14 at 17:20

2 Answers2

2

It's not the same data, so it's unreasonable to expect the same plan unless the statistics (the amount of rows where mnc = 41 and so forth, how the values are spread out throughout the table, etc.) are similar.

It's very possible that the value is frequent and spread all over the place in one case, and narrowly grouped in another. In the first case, it would usually be faster to seq scan for rows; in the other, it would usually be faster to index scan.

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • I updated the question with stats table for both tables. Maybe I'm wrong, but i assume that statistics are similar enough to use the same query plan. – MlinarG Dec 11 '14 at 16:58
  • Ya, so the frequencies are more or less similar (assuming the higher to lower percentages represent the same values) but it could still relate to data distribution or something like that. My hunch, if any, is that the excellent answer @ErwinBrandstetter posted is correct in suggesting to investigate whether your current clustered index is valid and useful -- but then, make sure to replace it with something as useful for other queries you're running on the table. – Denis de Bernardy Dec 11 '14 at 19:19
1

The plan to get rows from table coordinate is the same in both cases.

In the slower query on table 2012_10 Postgres reads rows from the table directly in a seq scan and hash joins to the results from coordinate.

In the faster query on table 2013_10 Postgres collects rows from the index on id_coordinate, filters the results by mnc and runs a nested loop with the results from coordinate.

Obviously Postgres expects the index to pay for the second query. 41% mnc = 41 for the fast case is slightly more selective that 43% for the slow one. Typically not enough to explain the difference, but the tipping point is somewhere. Obviously the decision to switch to a seqscan is a bad one, so I would guess your cost settings should be tuned. See below. Also I would run a test with the less frequent value for mnc.

Many details influence the decision as well as the outcome:

  • Frequency and distribution of values.
  • Table bloat from dead tuples (which you removed with CLUSTER, so we can rule that out).
  • Cache size and whether index and / or table are cached already (because of prior calls to the same table from this or other sessions). Run each query several times in tests to level the playing field.
  • Statistics Used by the Planner, collected by ANALYZE to give a realistic estimation of the situation.
  • Planner Cost Constants. If your statistics are not the problem, chances are you should tune your settings here.
  • Config settings, in particular shared_buffers, work_mem and effective_cache_size.

CLUSTER

Per documentation:

Because the planner records statistics about the ordering of tables, it is advisable to run ANALYZE on the newly clustered table. Otherwise, the planner might make poor choices of query plans.

Bold emphasis mine. This could be (part of) the answer.

Also CLUSTER on id_coordinate is not helping. Doesn't seem to improve locality of rows for the purpose of the query. I suggest you create an additional multi-column index

CREATE INDEX ix_history_?? ON history_?? (mnc, id_coordinate);

And

CLUSTER history_?? USING ix_history_??;

That should help some more - and the index should also be faster for a combined index scan instead of the filter step.

Better index in current version of Postgres

Not explaining the phenomenon, but your Postgres version 9.1.13 is getting old and a limiting factor. Many improvements since your version. Especially for big data and indexes.

With pg 9.2+ you could profit from index-only scans: include id_coordinate in the GiST index on gis.coordinate to make it a multicolumn index:

CREATE INDEX ix_coordinate_location ON gis.coordinate (id_coordinate, location)

You need the additional module btree_gist for that. Details:

Simpler query

Either way, you can simplify your query:

SELECT DISTINCT ON (1, 2)
       X(co.location) AS x
     , Y(co.location) AS y
     , tems.rx_level, tems.rx_quality, date, mnc, id_cell
FROM   tems.history_best_server_until_2012_10 AS tems
JOIN   gis.coordinate AS co USING (id_coordinate)
WHERE  co.location
    && setsrid(makeBox2d(GeomFromText('POINT(101000 461500)', 2710)
                       , GeomFromText('POINT(102400 463610)', 2710)), 2710)
AND    mnc = 41
ORDER  BY 1, 2, date DESC;

Shorter, without subquery, I don't expect much effect on performance, though.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228