16

I'm trying to optimise a complex query in PostgreSQL 9.1.2, which calls some functions. These functions are marked STABLE or IMMUTABLE and are called several times with the same arguments in the query. I assumed PostgreSQL would be smart enough to only call them once for each set of inputs - after all, that's the point of STABLE and IMMUTABLE, isn't it? But it appears that the functions are being called multiple times. I wrote a simple function to test this, which confirms it:

CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
    RAISE NOTICE 'Called with %', one;
    RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;


WITH data AS
(
    SELECT 10 AS num
    UNION ALL SELECT 10
    UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data;

Output:

NOTICE:  Called with 10
NOTICE:  Called with 10
NOTICE:  Called with 20

Why is this happening and how can I get it to only execute the function once?

EMP
  • 59,148
  • 53
  • 164
  • 220

2 Answers2

30

The following extension of your test code is informative:

CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
    RAISE NOTICE 'Immutable called with %', one;
    RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION test_multi_calls2(one integer)
RETURNS integer
AS $BODY$
BEGIN
    RAISE NOTICE 'Volatile called with %', one;
    RETURN one;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;

WITH data AS
(
    SELECT 10 AS num
    UNION ALL SELECT 10
    UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data
where test_multi_calls2(40) = 40
and test_multi_calls1(30) = 30

OUTPUT:

NOTICE:  Immutable called with 30
NOTICE:  Volatile called with 40
NOTICE:  Immutable called with 10
NOTICE:  Volatile called with 40
NOTICE:  Immutable called with 10
NOTICE:  Volatile called with 40
NOTICE:  Immutable called with 20

Here we can see that while in the select-list the immutable function was called multiple times, in the where clause it was called once, while the volatile was called thrice.

The important thing isn't that PostgreSQL will only call a STABLE or IMMUTABLE function once with the same data - your example clearly shows that this is not the case - it's that it may call it only once. Or perhaps it will call it twice when it would have to call a volatile version 50 times, and so on.

There are different ways in which stability and immutability can be taken advantage of, with different costs and benefits. To provide the sort of saving you are suggesting it should make with select-lists it would have to cache the results, and then lookup each argument (or list of arguments) in this cache before either returning the cached result or calling function on a cache-miss. This would be more expensive than calling your function, even in the case where there was a high percentage of cache-hits (there could be 0% cache hits meaning this "optimisation" did extra work for absolutely no gain). It could store maybe just the last parameter and result, but again that could be completely useless.

This is especially so considering that stable and immutable functions are often the lightest functions.

With the where clause however, the immutability of test_multi_calls1 allows PostgreSQL to actually restructure the query from the plain meaning of the SQL given:

For every row calculate test_multi_calls1(30) and if the result is equal to 30 continue processing the row in question

To a different query plan entirely:

Calculate test_multi_calls1(30) and if it is equal to 30 then continue with the query otherwise return a zero row result-set without any further calculation

This is the sort of use that PostgreSQL makes of STABLE and IMMUTABLE - not the caching of results, but the rewriting of queries into different queries which are more efficient but give the same results.

Note also that test_multi_calls1(30) is called before test_multi_calls2(40) no matter what order they appear in the where clause. This means that if the first call results in no rows being returned (replace = 30 with = 31 to test) then the volatile function won't be called at all - again regardless to which is on which side of the and.

This particular sort of rewriting depends upon immutability or stability. With where test_multi_calls1(30) != num query re-writing will happen for immutable but not for merely stable functions. With where test_multi_calls1(num) != 30 it won't happen at all (multiple calls) though there are other optimisations possible:

Expressions containing only STABLE and IMMUTABLE functions can be used with index scans. Expressions containing VOLATILE functions cannot. The number of calls may or may not decrease, but much more importantly the results of the calls will then be used in a much more efficient way in the rest of the query (only really matters on large tables, but then it can make a massive difference).

In all, don't think of volatility categories in terms of memoisation, but rather in terms of giving PostgreSQL's query planner opportunities to restructure entire queries in ways that are logically equivalent (same results) but much more efficient.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
0

According to the documentation IMMUTABLE functions will return the same value given the same arguments. Since you are feeding dynamic arguments (And not even the same once) optimizer has no reason to believe that it will get the same results and hence calls the function. Better qustion is: why is your query invoking the function multiple times if it doesn't need to?

MK.
  • 33,605
  • 18
  • 74
  • 111
  • Even if I hardcode the arguments, eg. `SELECT test_multi_calls1(10), test_multi_calls1(10)` the message is still printed twice. What is the point of the `IMMUTABLE` modifier then? – EMP Dec 16 '11 at 04:36
  • well, if this happens with hardcoded arguments you should probably contact PostgreSQL developers. – MK. Dec 16 '11 at 04:39
  • 2
    Can you think of a case where `SELECT f(x), f(x)` has any practical value other than demonstrating that `f(x)` is called twice? The query optimiser costs cpu and memory, and has to earn its keep. Looking for cases like that in every query in case they are there would degrade every query ever run except for that one. – Jon Hanna Dec 16 '11 at 12:29