Upgrading Postgres from 8.3.8 to 9.4.1 on new hardware. A representative set of queries shows that new system's performance ranges from 1x to 3x faster. However, one of our high-load areas is always slower.
EXPLAIN
output
8.3.8:
Nested Loop (cost=25.78..709859.61 rows=1 width=4) (actual time=14.972..190.591 rows=32 loops=1)
-> Bitmap Heap Scan on prime p (cost=25.78..1626.92 rows=1066 width=4) (actual time=1.567..9.597 rows=10742 loops=1)
Recheck Cond: ((pid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (tid = ANY ('{1,2,3}'::integer[])))
Filter: (NOT deleted)
-> Bitmap Index Scan on FOO_IDX1 (cost=0.00..25.73 rows=1066 width=0) (actual time=1.144..1.144 rows=10742 loops=1)
Index Cond: ((pid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (deleted = false) AND (tid = ANY ('{1,2,3}'::integer[])))
-> Index Scan using FOO_IDX2 on data d (cost=0.00..663.88 rows=1 width=4) (actual time=0.017..0.017 rows=0 loops=10742)
Index Cond: (d.pid = p.pid)
Filter: (lower("substring"(d.value, 1, 1000)) ~~ '%something%'::text)
Total runtime: 190.639 ms
9.4.1:
Nested Loop (cost=1.15..335959.94 rows=1 width=4) (actual time=24.712..365.057 rows=32 loops=1)
-> Index Scan using FOO_IDX1 on prime p (cost=0.57..953.17 rows=1033 width=4) (actual time=0.048..13.884 rows=10741 loops=1)
Index Cond: ((pid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (deleted = false) AND (tid = ANY ('{1,2,3}'::integer[])))
Filter: (NOT deleted)
-> Index Scan using FOO_IDX2 on data d (cost=0.57..324.29 rows=1 width=4) (actual time=0.032..0.032 rows=0 loops=10741)
Index Cond: (pid = p.pid)
Filter: (lower("substring"(value, 1, 1000)) ~~ '%something%'::text)
Rows Removed by Filter: 11
Planning time: 0.940 ms
Execution time: 365.156 ms
Indexes
…btree (pid);
…btree (lower("substring"(value, 1, 1000)) text_pattern_ops, fid);
…btree (lower("substring"(value, 1, 1000)), fid);
Settings
Varying the ranges of the following did not improve this case…
checkpoint_completion_target = 0.5
checkpoint_segments = 32
checkpoint_timeout = 30min
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0025
cpu_tuple_cost = 0.01
default_statistics_target = 500 (evaluated 100 to 10000 analyse after each)
effective_cache_size = 288GB
enable_seqscan = off
from_collapse_limit = 8
geqo = off
join_collapse_limit = 8
random_page_cost = 1.0
seq_page_cost = 1.0
shared_buffers = 96GB
work_mem = 64MB
We see a similar result for something%
too.
Before we drop this in for a few years I'd like to know if there is anything more I could do to optimise for these important cases.
Statement
SELECT p.pid
FROM prime p
INNER JOIN data d ON p.pid = d.pid
WHERE LOWER(substring(d.value,1,1000)) LIKE '%something%'
AND p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
Table definitions
Simplified and sanitized.
\d prime
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------------------------
pid | integer | not null default nextval('prime_seq'::regclass)
deleted | boolean |
ppid | integer |
tid | integer |
\d data
Column | Type | Modifiers
----------------+---------+------------------------------------------------------
pdid | integer | not null default nextval('data_seq'::regclass)
pid | integer |
value | text |
New test results
I've tried a range of default_statistics_target.
default_statistics_target = 100 @ 381 ms
default_statistics_target = 500 @ 387 ms
default_statistics_target = 1000 @ 384 ms
default_statistics_target = 5000 @ 369 ms
(analyze and warm-up between test cycles)
This value can make a substantial difference in other areas of our application. 500 seems ideal, 5000+ caused other areas to slow down 3x to 10x.
Our kit is designed such that the entire database should always be in memory.
random_page_cost = 1.0 @ 372 ms
random_page_cost = 1.1 @ 372 ms
random_page_cost = 4.0 @ 370 ms
random_page_cost = 10.0 @ 369 ms
With enable_bitmapscan = off @ 362 ms (results in the same plan as expected)
Earlier I also tried enable_indexscan = off @ 491 ms (triggered different plan of course)
Yes, pg 8.3's plan uses an index and a bitmap index scan - which I think is the "nut" of this issue.
Thanks for the link to the related article.
The advice on column order is very interesting.
At our scale and growing, what would the optimal field order be for the following schemas?
What is the most effective way to reorganize the column order on a loaded table to realize the benefit?
prime having:
integer
text
boolean
boolean
integer
integer
smallint
integer
timestamp without time zone
timestamp without time zone
timestamp without time zone
text
data having:
integer
integer
integer
text
SELECT pid
FROM data d
JOIN prime p USING (pid)
WHERE LOWER(substring(d.value,1,1000)) LIKE '%something%'
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
AND p.tid IN (1, 2, 3)
AND p.deleted = FALSE;
- No measurable difference observed in this approach, same plan (+/-5msec)
- We generally try to reduce the scope of records searched in data first by using prime to check acl, status, etc. (prime is 1/10 the size)
lower(substring(d.value,1,1000)) = 355 ms
lower(left(d.value,1000)) = 343 ms (~3% faster over multiple tests, I'll take that!)
To handle the unanchored case we have a second index using the operator class "text_pattern_ops".
We've evaluated multi-column GIN indexes before, but didn't realize the expected benefits. Complex because A) multiple criteria to meet on acl, status, and similar, B) need to hit "exact phrases" which require a result phrase recheck. I'm optimistic on using a full-text approach long term, the recipes we've tried so far are not faster or more stable than the old-school BTREE approach; yet.
GIN trial 1
CREATE EXTENSION btree_gin
CREATE INDEX FOO_IDX3 ON data USING GIN (to_tsvector('simple', lower(left(value, 1000))), pid)
ANALYSE data
SELECT p.pid
FROM prime p
INNER JOIN data d ON p.pid = d.pid
WHERE to_tsvector('simple', lower(left(d.value, 1000))) @@ to_tsquery('simple', 'something')
AND p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
Execution time: 1034.866 ms (without phrase recheck)
GIN trial 2
CREATE EXTENSION pg_trgm
CREATE INDEX FOO_IDX4 ON data USING gin (left(value,1000) gin_trgm_ops, pid);
ANALYSE data
SELECT p.pid
FROM prime p
INNER JOIN data d ON p.pid = d.pid
WHERE left(d.value,1000) LIKE '%Something%'
AND p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
Hash Join (cost=2870.42..29050.89 rows=1 width=4) (actual time=668.333..2262.101 rows=32 loops=1)
Hash Cond: (d.pid = p.pid)
-> Bitmap Heap Scan on data d (cost=230.30..26250.04 rows=25716 width=4) (actual time=653.130..2234.736 rows=38659 loops=1)
Recheck Cond: ("left"(value, 1000) ~~ '%Something%'::text)
Rows Removed by Index Recheck: 146677
Heap Blocks: exact=161810
-> Bitmap Index Scan on FOO_IDX4 (cost=0.00..223.87 rows=25716 width=0) (actual time=575.442..575.442 rows=185336 loops=1)
Index Cond: ("left"(value, 1000) ~~ '%Something%'::text)
-> Hash (cost=2604.33..2604.33 rows=2863 width=4) (actual time=15.158..15.158 rows=10741 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 378kB
-> Index Scan using FOO_IDX4 on prime p (cost=0.57..2604.33 rows=2863 width=4) (actual time=0.064..11.737 rows=10741 loops=1)
Index Cond: ((ppid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (deleted = false) AND (tid = ANY ('{1,2,3}'::integer[])))
Filter: (NOT deleted)
Planning time: 1.861 ms
Execution time: 2262.210 ms
We already have an index on prime w/ "ppid, deleted, tid", sorry this was not clear originally.