Why?
Fast query:
-> Hash Left Join (cost=1378.60..2467.48 rows=15 width=79) (actual time=41.759..85.037 rows=1129 loops=1)
...
Filter: (unaccent(((((COALESCE(p.abrev, ''::character varying))::text || ' ('::text) || (COALESCE(p.prenome, ''::character varying))::text) || ')'::text)) ~~* (...)
Slow query:
-> Hash Left Join (cost=1378.60..2467.48 rows=1 width=79) (actual time=35.084..80.209 rows=1129 loops=1)
...
Filter: (unaccent(((((COALESCE(p.abrev, ''::character varying))::text || ' ('::text) || (COALESCE(p.prenome, ''::character varying))::text) || ')'::text)) ~~* unacc (...)
Extending the search pattern by another character causes Postgres to assume yet fewer hits. (Typically, this is a reasonable estimate.) Postgres obviously does not have precise enough statistics (none, actually, see below) to expect the same number of hits that you really get.
This causes a switch to a different query plan, which is even less optimal for the actual number of hits rows=1129
.
Solution
One way to improve the situation is to create an expression index on the expression in the predicate. This makes Postgres gather statistics for the actual expression, which can help the query even if the index itself is not used for the query. Without the index, there are no statistics for the expression at all. And if done right the index can be used for the query, that's even much better. But there are multiple problems with your current expression:
unaccent(TEXT(coalesce(p.abrev,'')||' ('||coalesce(p.prenome,'')||')')) ilike unaccent('%vicen%')
Consider this updated query, based on some assumptions about your undisclosed table definitions:
SELECT e.id
, (SELECT count(*) FROM imgitem
WHERE tabid = e.id AND tab = 'esp') AS imgs -- count(*) is faster
, e.ano, e.mes, e.dia
, e.ano::text || to_char(e.mes2, 'FM"-"00')
|| to_char(e.dia, 'FM"-"00') AS data
, pl.pltag, e.inpa, e.det, d.ano anodet
, format('%s (%s)', p.abrev, p.prenome) AS determinador
, d.tax
, coalesce(v.val,v.valf) || ' ' || vu.unit AS altura
, coalesce(v1.val,v1.valf) || ' ' || vu1.unit AS dap
, d.fam, tf.nome família, d.gen, tg.nome AS gênero, d.sp
, ts.nome AS espécie, d.inf, e.loc, l.nome localidade, e.lat, e.lon
FROM pess p -- reorder!
JOIN det d ON d.detby = p.id -- INNER JOIN !
LEFT JOIN tax tf ON tf.oldfam = d.fam
LEFT JOIN tax tg ON tg.oldgen = d.gen
LEFT JOIN tax ts ON ts.oldsp = d.sp
LEFT JOIN tax ti ON ti.oldinf = d.inf -- unused, see @joop's comment
LEFT JOIN esp e ON e.det = d.id
LEFT JOIN loc l ON l.id = e.loc
LEFT JOIN var v ON v.esp = e.id AND v.key = 265
LEFT JOIN varunit vu ON vu.id = v.unit
LEFT JOIN var v1 ON v1.esp = e.id AND v1.key = 264
LEFT JOIN varunit vu1 ON vu1.id = v1.unit
LEFT JOIN pl ON pl.id = e.pl
WHERE f_unaccent(p.abrev) ILIKE f_unaccent('%' || 'vicenti' || '%') OR
f_unaccent(p.prenome) ILIKE f_unaccent('%' || 'vicenti' || '%');
Major points
Why f_unaccent()
? Because unaccent()
can't be indexed. Read here:
I used the function outlined there to allow the following (recommended!) multicolumn functional trigram GIN index:
CREATE INDEX pess_unaccent_nome_trgm_idx ON pess
USING gin (f_unaccent(pess) gin_trgm_ops, f_unaccent(prenome) gin_trgm_ops);
If you are not familiar with trigram indexes, read this first:
And possibly:
Be sure to run the latest version of Postgres (9.5 at the time of writing). There have been substantial improvements to GIN indexes. And you'll be interested in improvements in pg_trgm 1.2, scheduled to be released with the upcoming Postgres 9.6:
Prepared statements are a common way to execute queries with parameters (especially with text from user input). Postgres has to find a plan that works best for any given parameter. Add wildcards as constants to the to the search term like this:
f_unaccent(p.abrev) ILIKE f_unaccent('%' || 'vicenti' || '%')
('vicenti'
would be replaced with a parameter.) So Postgres knows we are dealing with a pattern that is neither anchored left nor right - which would allow different strategies. Related answer with more details:
Or maybe re-plan the query for every search term (possibly using dynamic SQL in a function). But make sure planning time isn't eating any possible performance gain.
The WHERE
condition on columns in pess
contradicts the LEFT JOIN
. Postgres is forced to convert that to an INNER JOIN
. Unfortunately, the join comes late in the join tree. And since Postgres cannot reorder your joins (see below), that can become very expensive. Move the table to the first position in the FROM
clause to eliminate rows early. Following LEFT JOIN
s do not eliminate any rows by definition. But with that many tables it is important to move joins that might multiply rows to the end.
You are joining 13 tables, 12 of them with LEFT JOIN
which leaves 12!
possible combinations - or 11! * 2!
if we take the one LEFT JOIN
into account that's really an INNER JOIN
. That's too many for Postgres to evaluate all possible permutations for the best query plan. Read about join_collapse_limit
:
The default setting for join_collapse_limit
is 8, which means that Postgres won't try to reorder tables in your FROM
clause and the order of tables becomes relevant.
One way to work around this would be to split the performance-critical part into a CTE like @joop commented. Don't set join_collapse_limit
much higher or times for query planning involving many joined tables will deteriorate.
About your concatenated date named data
:
cast(cast(e.ano as varchar(4))||'-'||right('0'||cast(e.mes as varchar(2)),2)||'-'|| right('0'||cast(e.dia as varchar(2)),2) as varchar(10)) as data
Assuming you build from three numeric columns for year, month and day, which are defined NOT NULL
, use this instead:
e.ano::text || to_char(e.mes2, 'FM"-"00')
|| to_char(e.dia, 'FM"-"00') AS data
About the FM
template pattern modifier:
But really, you should store the date as data type date
to begin with.
Also simplified:
format('%s (%s)', p.abrev, p.prenome) AS determinador
Won't make the query faster, but it's much cleaner. See format()
.
First things last, all the usual advice for performance optimization applies:
If you get all of this right, you should see much faster queries for all patterns.