Subquery is faster using a function

Question

I have a long query (~200 lines) that I have embedded in a function:

CREATE FUNCTION spot_rate(base_currency character(3),
                          contra_currency character(3),
                          pricing_date date) RETURNS numeric(20,8)

Whether I run the query directly or the function I get similar results and similar performance. So far so good.

Now I have another long query that looks like:

SELECT x, sum(y * spot_rates.spot)
FROM (SELECT a, b, sum(c) FROM t1 JOIN t2 etc. (6 joins here)) AS table_1,
     (SELECT
        currency,
        spot_rate(currency, 'USD', current_date) AS "spot"
      FROM (SELECT DISTINCT currency FROM table_2) AS "currencies"
     ) AS "spot_rates"
WHERE
     table_1.currency = spot_rates.currency
GROUP BY / ORDER BY

This query runs in 300 ms, which is slowish but fast enough at this stage (and probably makes sense given the number of rows and aggregation operations).

If however I replace spot_rate(currency, 'USD', current_date) by its equivalent query, it runs in 5+ seconds.

Running the subquery alone returns in ~200ms whether I use the function or the equivalent query.

Why would the query run more slowly than the function when used as a subquery?

ps: I hope there is a generic answer to this generic problem - if not I'll post more details but creating a contrived example is not straightforward.

EDIT: EXPLAIN ANALYZE run on the 2 subqueries and whole queries

subquery with function: http://explain.depesz.com/s/UHCF
subquery with direct query: http://explain.depesz.com/s/q5Q
whole query with function: http://explain.depesz.com/s/ZDt
whole query with direct query: http://explain.depesz.com/s/R2f
just the function body, using one set of arguments: http://explain.depesz.com/s/mEp

Is `spot_rate` function `VOLATILE`, `STABLE` or `IMMUTABLE`? — Eggplant, May 08 '15 at 12:49
@Eggplant I don't think any of them - it's declared like this: `CREATE OR REPLACE FUNCTION spot_rate(args) RETURN numeric(20,8) AS $$ SELECT xxx $$ LANGUAGE SQL;` — assylias, May 08 '15 at 12:52
You can easily check this out using pgAdmin clicking on the function name in the object browser and looking at the definition in the SQL pane. Anyways, if you haven't specified any, PostgreSQL assumes `VOLATILE`. If you had defined it as `STABLE` or `IMMUTABLE` the answer to your question would have been easily guessable, but since you haven't, it looks counter intuitive really, one would expect the opposite. — Eggplant, May 08 '15 at 13:01
you are not the first to ask a question like that: http://stackoverflow.com/questions/28227189/sql-function-very-slow-compared-to-query-without-function-wrapper — bovino Marcelo Bezerra, May 08 '15 at 13:09
http://stackoverflow.com/questions/9305133/why-does-postgresql-treat-my-query-differently-in-a-function — bovino Marcelo Bezerra, May 08 '15 at 13:09
I would ask you to provied `EXPLAIN (analyze)` output of all query variants. You can post your output into http://explain.depesz.com and post links here. — vyegorov, May 08 '15 at 13:41
@Eggplant yes indeed - I came across those 2 questions but they don't seem to help - added EXPLAIN. — assylias, May 08 '15 at 14:43
So the "subquery with direct query" looks pretty sketchy, but I can't tell how sketchy without an explain of just the function itself. Could you add an explain of just the function body? — Mr. Llama, May 08 '15 at 14:47
Wild guess: your query's range-table is exceeding the join_collapse_limit, causing a suboptimal plan to be used. Try moving the subquery-body (the equivalent of the function) into a CTE to keep it intact. — joop, May 08 '15 at 14:55
@joop Wow - I don't know if that's the reason but using a CTE brings the runtime from 5000ms to 300ms, in line with the equivalent query using a function. Feel free to post your comment as an answer. — assylias, May 08 '15 at 15:36
Okay, I will. (once you have experienced this kind of behaviour, you'll remember it the next time ;-) — joop, May 08 '15 at 15:38

score 3 · Accepted Answer · edited May 08 '15 at 17:46

Just a wild guess: your query's range-table is exceeding the join_collapse_limit, causing a suboptimal plan to be used.

Try moving the subquery-body (the equivalent of the function) into a CTE, to keep it intact. (CTE's are always executed, and never broken-up by the query-generator/planner)
pre-calculting parts of the query into (TEMP) tables or materialised views can also help to reduce the number of RTEs
You could (temporarily) increase join_collapse_limit, but this will cost more planning time, and there certainly is a limit to this (the number of possible plans grows exponentially with the size of the range table.)
Normally, you can detect this behaviour by the bad query plan (like here: fewer index scans), but you'll need knowledge of the schema, and there must be some kind of reasonable plan possible (read: PK/FK and indices must be correct, too)

Subquery is faster using a function

1 Answers1