0

Let's assume we have this function:

create or replace function foo(a integer)
returns table (b integer, c integer)
language plpgsql
as $$
begin
    raise notice 'foo()';
    return query select a*2, a*4;
    return query select a*6, a*8;
    return query select a*10, a*12;
end;
$$;

The "raise notice 'foo()'" part will be used to know how many time the function is called.

If i call the function this way:

postgres=# SELECT i, foo(i) as bla FROM generate_series(1,3) as i;
NOTICE:  foo()
NOTICE:  foo()
NOTICE:  foo()
 i |   bla   
---+---------
 1 | (2,4)
 1 | (6,8)
 1 | (10,12)
 2 | (4,8)
 2 | (12,16)
 2 | (20,24)
 3 | (6,12)
 3 | (18,24)
 3 | (30,36)
(9 rows)

We can see that, as expected, foo() is called 3 times.

But if i call the function this way (so i actually gets foo() result in different columns):

postgres=# SELECT i, (foo(i)).* FROM generate_series(1,3) as i;
NOTICE:  foo()
NOTICE:  foo()
NOTICE:  foo()
NOTICE:  foo()
NOTICE:  foo()
NOTICE:  foo()
 i | b  | c  
---+----+----
 1 |  2 |  4
 1 |  6 |  8
 1 | 10 | 12
 2 |  4 |  8
 2 | 12 | 16
 2 | 20 | 24
 3 |  6 | 12
 3 | 18 | 24
 3 | 30 | 36
(9 rows)

We can see that foo() is called 6 times. And if foo() was returning 3 columns, it would have been called 9 times. It's pretty clear that foo() is called for every i and every column it returns.

I don't understand why postgres does not make an optimisation here. And this is a problem for me as my (real) foo() may be CPU intensive. Any idea ?

Edit: Using an "immutable" function or a function that does not return multiple rows gives the same behaviour:

create or replace function foo(a integer)
returns table (b integer, c integer, d integer)
language plpgsql
immutable
as $$
begin
raise notice 'foo';
return query select a*2, a*3, a*4;
end;
$$;

postgres=# select i, (foo(i)).* from generate_series(1,2) as i;
NOTICE:  foo
NOTICE:  foo
NOTICE:  foo
NOTICE:  foo
NOTICE:  foo
NOTICE:  foo
 i | b | c | d 
---+---+---+---
 1 | 2 | 3 | 4
 2 | 4 | 6 | 8
(2 rows)
klin
  • 112,967
  • 15
  • 204
  • 232
user368507
  • 1,388
  • 1
  • 13
  • 25

2 Answers2

3

Basically it is reasonable not to call functions that return more than one value (especially functions returning sets) in select clause. In fact postgres does not make any optimization for such a call. Place your function in from clause.

SELECT i, f.* FROM generate_series(1,3) as i, foo(i) f;

In the documentation you can find the note (emphasis mine):

Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases.

klin
  • 112,967
  • 15
  • 204
  • 232
  • It's totally reasonable to call a function in the `select` clause - but it's unreasonable to call functions that return a **set** (=multiple rows) in the `select` clause –  Jun 23 '15 at 22:15
  • That function doesn't return a set - it returns a row. Set returning functions have different issues when invoked in `SELECT` and `LATERAL` is still preferred. – Craig Ringer Jun 24 '15 at 00:51
  • @klin Er. Yes. I plead -ENOCOFFEE. The issue with multiple calls isn't called by it returning a set, but it does return a set. – Craig Ringer Jun 24 '15 at 01:30
3

This is a known issue.

SELECT (f(x)).*

is macro-expanded at parse-time into

SELECT (f(x)).a, (f(x)).b, ...

and PostgreSQL doesn't coalesce multiple calls to the same function down to a single call.

To avoid the issue you can wrap it in another layer of subquery so that the macro-expansion occurs on a simple reference to the function's result rather than the function invocation:

select i, (f).* 
FROM (
    SELECT i, foo(i) f from generate_series(1,2) as i
) x(i, f)

or use a lateral call in the FROM clause, which is preferred for newer versions:

select i, f.*
from generate_series(1,2) as i
    CROSS JOIN LATERAL foo(i) f;

The CROSS JOIN LATERAL may be omitted, using legacy comma joins and an implicit lateral join, but I find it considerably clear to include it, especially when you're mixing other join types.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • What is "x(i, f)" ? Is it just a fancy alias or something else ? If i replace it by a random string the request still works. – user368507 Jun 24 '15 at 07:53
  • @user368507 Yeah, it specifies the table and columns returned by the subquery. The table name part is required. The columns, if omitted, default to the column names of the inner query. – Craig Ringer Jun 24 '15 at 07:58