1

I have a hard time understanding why I can refer to the output columns in returns table(col type).

There is a subtle bug in the below code, the order by var refers to res in returns, not to data1 which we aliased to res. res in where is always null and we get 0 rows.

Why can I refer to the column name in output?
In what cases do I want this?

CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(res int )
LANGUAGE plpgsql
AS $function$
begin
    return query 
    select data1 res 
    from table_with_data 
    where res < var;
end
$function$
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
klundby
  • 301
  • 1
  • 3
  • 17

2 Answers2

3

Why can I refer to the column name in output

From the manual, the section about function parameters:

column_name The name of an output column in the RETURNS TABLE syntax. This is effectively another way of declaring a named OUT parameter, except that RETURNS TABLE also implies RETURNS SETOF.

What this means is that in your case res is effectively a writeable variable, which type you plan to return a set of. As any other variable without a default value assigned, it starts off as null.

In what case do I want this

You can return multiple records from a function of this type with a single return query, but another way is by a series of multiple return query or return next - in the second case, filling out the fields in a record of your output table each time. You could have expected a return statement to end the function, but in this scenario only a single return; without anything added would have that effect.

create table public.test_res (data integer);

CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(res int )
LANGUAGE plpgsql
AS $function$
begin
    insert into public.test_res select res;--to inspect its initial value later
    select 1 into res;
    return next;
    return next;--note that res isn't reset after returning next
    return query select 2;--doesn't affect the current value of res
    return next;--returning something else earlier didn't affect res either
    return;--it will finish here
    select 3 into res;
    return next;
end
$function$;
select * from test(0);
-- res
-------
--   1
--   1
--   2
--   1
--(4 rows)
table public.test_res; --this was the initial value of res within the function
-- data
--------
-- null
--(1 row)

Which is the most useful with LOOPs

CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(comment text,res int) LANGUAGE plpgsql AS $function$
declare  rec record;
         array_slice int[];
begin
    return query select 'return query returned these multiple records in one go', a from generate_series(1,3,1) a(a);
    res:=0;
    comment:='loop exit when res>4';
    loop exit when res>4;
        select res+1 into res;
        return next;
    end loop;
    comment:='while res between 5 and 8 loop';
    while res between 5 and 8 loop
      select res+2 into res;
      return next;
    end loop;
    comment:='for element in reverse 3 .. -3 by 2 loop';
    for element in reverse 3 .. -3 by 2 loop
      select element into res;
      return next;
    end loop;
    comment:='for <record> in <expression> loop';
    for rec in select pid from pg_stat_activity where state<>'idle' loop
      select rec.pid into res;
      return next;
    end loop;
    comment:='foreach array_slice slice 1 in array arr loop';
    foreach array_slice SLICE 1 in array ARRAY[[1,2,3],[11,12,13],[21,22,23]] loop
      select array_slice[1] into res;
      return next;
    end loop;
end
$function$;

Example results

select * from public.test(0);
--                        comment                         |  res
----------------------------------------------------------+--------
-- return query returned these multiple records in one go |      1
-- return query returned these multiple records in one go |      2
-- return query returned these multiple records in one go |      3
-- loop exit when res>4                                   |      1
-- loop exit when res>4                                   |      2
-- loop exit when res>4                                   |      3
-- loop exit when res>4                                   |      4
-- loop exit when res>4                                   |      5
-- while res between 5 and 8 loop                         |      7
-- while res between 5 and 8 loop                         |      9
-- for element in reverse 3 .. -3 by 2 loop               |      3
-- for element in reverse 3 .. -3 by 2 loop               |      1
-- for element in reverse 3 .. -3 by 2 loop               |     -1
-- for element in reverse 3 .. -3 by 2 loop               |     -3
-- for <record> in <expression> loop                      | 118786
-- foreach array_slice slice 1 in array arr loop          |      1
-- foreach array_slice slice 1 in array arr loop          |     11
-- foreach array_slice slice 1 in array arr loop          |     21
--(18 rows)
Zegarek
  • 6,424
  • 1
  • 13
  • 24
1

True, OUT parameters (including field names in a RETURNS TABLE (...) clause) are visible in all SQL DML statements in a PL/pgSQL function body, just like other variables. Find details in the manual chapters Variable Substitution and Returning from a Function for PL/pgSQL.

However, a more fundamental misunderstanding comes first here. The syntax of your nested SELECT is invalid to begin with. The PL/pgSQL variable happens to mask this problem (with a different problem). In SQL, you cannot refer to output column names (column aliases in the SELECT clause) in the WHERE clause. This is invalid:

select data1 res 
from table_with_data 
where res < var;

The manual:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

This is different for ORDER BY, which you mention in the text, but don't include in the query. See:

Fixing immediate issue

Could be repaired like this:

CREATE OR REPLACE FUNCTION public.test1(var int)
  RETURNS TABLE(res int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY
   SELECT data1 AS res     -- column alias is just noise (or documentation)
   FROM   table_with_data
   WHERE  data1 < var;     -- original column name!
END
$func$

fiddle

See:

The column alias is just noise in this case. The name of the column returned from the function is res in any case - as defined in the RETURNS TABLE clause.

Aside: It's recommended not to omit the AS keyword for column aliases (unlike table aliases). See:

If there was actual ambiguity between column and variable name - say, you declared an OUT parameter or variable named data1 - you'd get an error message like this:

ERROR:  column reference "data1" is ambiguous
LINE 2:    select data1 
                  ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.

Brute force fix

Could be fixed with a special command at the start of the function body:

CREATE OR REPLACE FUNCTION public.test3(var int)
  RETURNS TABLE(data1 int)
  LANGUAGE plpgsql AS
$func$
#variable_conflict use_column  -- ! to resolve conflicts
BEGIN
   RETURN QUERY
   SELECT data1
   FROM   table_with_data
   WHERE  data1 < var;         -- !
END
$func$

See:

Proper fix

Table-qualify column names, and avoid conflicting variable names to begin with.

CREATE OR REPLACE FUNCTION public.test4(_var int)
  RETURNS TABLE(res int)
  LANGUAGE plpgsql STABLE AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.data1            -- table-qualify column name
   FROM   table_with_data t
   WHERE  t.data1 < _var;    -- !
END
$func$

Example:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228