postgres functions: when does IMMUTABLE hurt performance?

Question

For best optimization results, you should label your functions with the strictest volatility category that is valid for them.

However, I seem to have an example where this is not the case, and I'd like to understand what's going on. (Background: I'm running postgres 9.2)

I often need to convert times expressed as integer numbers of seconds to dates. I've written a function to do this:

CREATE OR REPLACE FUNCTION 
  to_datestamp(time_int double precision) RETURNS date AS $$
  SELECT date_trunc('day', to_timestamp($1))::date;
$$ LANGUAGE SQL;

Let's compare performance to otherwise identical functions, with volatility set to IMMUTABLE and to STABLE:

CREATE OR REPLACE FUNCTION 
  to_datestamp_immutable(time_int double precision) RETURNS date AS $$
  SELECT date_trunc('day', to_timestamp($1))::date;
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION 
  to_datestamp_stable(time_int double precision) RETURNS date AS $$
  SELECT date_trunc('day', to_timestamp($1))::date;
$$ LANGUAGE SQL STABLE;

To test this, I'll create a table of 10^6 random integers corresponding to times between 2010-01-01 and 2015-01-01

CREATE TEMPORARY TABLE random_times AS
  SELECT 1262304000 + round(random() * 157766400) AS time_int 
  FROM generate_series(1, 1000000) x;

Finally, I'll time calling the two functions on this table; on my particular box, the original takes ~6 seconds, the immutable version takes ~33 seconds, and the stable version takes ~6 seconds.

EXPLAIN ANALYZE SELECT to_datestamp(time_int) FROM random_times;

Seq Scan on random_times  (cost=0.00..20996.62 rows=946950 width=8) 
  (actual time=0.150..5493.722 rows=1000000 loops=1)
Total runtime: 6258.827 ms


EXPLAIN ANALYZE SELECT to_datestamp_immutable(time_int) FROM random_times;

Seq Scan on random_times  (cost=0.00..250632.00 rows=946950 width=8) 
  (actual time=0.211..32209.964 rows=1000000 loops=1)
Total runtime: 33060.918 ms


EXPLAIN ANALYZE SELECT to_datestamp_stable(time_int) FROM random_times;
Seq Scan on random_times  (cost=0.00..20996.62 rows=946950 width=8)
  (actual time=0.086..5295.608 rows=1000000 loops=1)
Total runtime: 6063.498 ms

What's going on here? E.g., is postgres spending time caching results when that won't actually be helpful since the arguments passed to the function are unlikely to repeat?

(I'm running postgres 9.2.)

Thanks!

UPDATE

Thanks to Craig Ringer this has been discussed on the pgsql-performance mailing list. Highlights:

Tom Lane says

[ shrug... ] Using IMMUTABLE to lie about the mutability of a function (in this case, date_trunc) is a bad idea. It's likely to lead to wrong answers, never mind performance issues. In this particular case, I imagine the performance problem comes from having suppressed the option to inline the function body ... but you should be more worried about whether you aren't getting flat-out bogus answers in other cases.

Pavel Stehule says

If I understand, a used IMMUTABLE flag disables inlining. What you see, is SQL eval overflow. My rule is - don't use flags in SQL functions, when it is possible.

I was under the impression that immutable couldn't be used with anything related to random or timezones, so I would guess that it is internally converting your function to VOLATILE at run time. — bma, Aug 13 '13 at 23:24
Hmm, the docs warn "A common error is to label a function IMMUTABLE when its results depend on a configuration parameter. For example, a function that manipulates timestamps might well have results that depend on the TimeZone setting. For safety, such functions should be labeled STABLE instead." But I'm not sure I see how that would impact performance in this way. — brahn, Aug 13 '13 at 23:42
Please show the actual plans generated by `explain analyze`. You can paste them to http://explain.depesz.com/ and put links here if desired. It's typical for `STRICT` to affect performance (it prevents inlining of SQL functions), but *adding* `IMMUTABLE` should not AFAIK. — Craig Ringer, Aug 13 '13 at 23:59
@bma I wish PostgreSQL could do that, but unfortunately it can't. If you lie about `IMMUTABLE` or get it wrong accidentally, the server won't figure that out; it has no algorithm in place to "prove" the stability of a function. — Craig Ringer, Aug 14 '13 at 00:02
@CraigRinger -- ANALYZE output added to main body; not sure how revealing it is, though. — brahn, Aug 14 '13 at 00:10
@brahn That's ... well, completely bizarre. Are you *utterly certain* that the only difference is `IMMUTABLE`? check the sources from `pg_proc` to verify. I'm inclined to raise this on the mailing list if you can provide a reproducible test case showing it with generated test data. — Craig Ringer, Aug 14 '13 at 00:11
@CraigRinger -- yup, it sure looks the same: https://gist.github.com/brahn/2f56ddde241dbc6a2ece — brahn, Aug 14 '13 at 00:18
I've raised this on pgsql-performance; thread starts here: http://www.postgresql.org/message-id/520AD240.9060508@2ndquadrant.com — Craig Ringer, Aug 14 '13 at 00:47
@CraigRinger I was thinking along the lines of "oh, we see that you are using mutable functions, this function will be defined as volatile then". Or at least that's the interpretation that I had from reading several threads over the years from the pg-hackers luminaries. Probably I misunderstood that functionality. — bma, Aug 14 '13 at 03:10
@bma I'd be really interested in message links, since I was under the impression that wasn't supported. Just if you can think of any. — Craig Ringer, Aug 14 '13 at 03:13
@CraigRinger A quick search through the archives leads me to believe that a). I was wrong about immutable-to-volatile happening magically, and b). I was thinking about recent threads about STABLE and IMMUTABLE (not necessarily VOLATILE). Here is a link where Tom Lane directly contradicts my earlier assumption: http://markmail.org/message/sekx7mjhq27vxu7d — bma, Aug 14 '13 at 03:24
@bma: I would consider it a *feature*, that you can force a function to be `IMMUTABLE`. [Here is one application](http://stackoverflow.com/questions/11005036/does-postgresql-support-accent-insensitive-collations/11007216#11007216). One needs to be aware of what he's messing with, of course. — Erwin Brandstetter, Aug 14 '13 at 10:18

Clodoaldo Neto · Answer 1 · 2013-08-14T00:27:09.073

The problem is that to_timestamp returns timestamp with time zone. If the to_timestamp function is replaced with a "manual" calculation without time zone there is no difference in performance

create or replace function to_datestamp_stable(
    time_int double precision
) returns date as $$
  select date_trunc('day', timestamp 'epoch' + $1 * interval '1 second')::date;
$$ language sql stable;

explain analyze
select to_datestamp_stable(a)
from generate_series(1, 1000000) s (a);
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series s  (cost=0.00..22.50 rows=1000 width=4) (actual time=96.962..433.562 rows=1000000 loops=1)
 Total runtime: 459.531 ms

create or replace function to_datestamp_immutable(
    time_int double precision
) returns date as $$
  select date_trunc('day', timestamp 'epoch' + $1 * interval '1 second')::date;
$$ language sql immutable;

explain analyze
select to_datestamp_immutable(a)
from generate_series(1, 1000000) s (a);
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series s  (cost=0.00..22.50 rows=1000 width=4) (actual time=94.188..433.492 rows=1000000 loops=1)
 Total runtime: 459.434 ms

Same functions using to_timestamp

create or replace function to_datestamp_stable(
    time_int double precision
) returns date as $$
  select date_trunc('day', to_timestamp($1))::date;
$$ language sql stable;

explain analyze
select to_datestamp_stable(a)
from generate_series(1, 1000000) s (a);
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series s  (cost=0.00..20.00 rows=1000 width=4) (actual time=91.924..3059.570 rows=1000000 loops=1)
 Total runtime: 3103.655 ms

create or replace function to_datestamp_immutable(
    time_int double precision
) returns date as $$
  select date_trunc('day', to_timestamp($1))::date;
$$ language sql immutable;

explain analyze
select to_datestamp_immutable(a)
from generate_series(1, 1000000) s (a);
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series s  (cost=0.00..262.50 rows=1000 width=4) (actual time=92.639..20083.920 rows=1000000 loops=1)
 Total runtime: 20149.311 ms

Interesting result! Any idea why the time zone would have this impact? — brahn, Aug 14 '13 at 00:31

postgres functions: when does IMMUTABLE hurt performance?

1 Answers1

Linked