6

Consider these two PostgreSQL functions:

CREATE OR REPLACE FUNCTION f_1 (v1 INTEGER, v2 OUT INTEGER)
AS $$
BEGIN
    v2 := v1;
END
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION f_2 (v1 INTEGER)
RETURNS TABLE(v2 INTEGER)
AS $$
BEGIN
    v2 := v1;
END
$$ LANGUAGE plpgsql;

In any "ordinary" procedural SQL language (e.g. Transact-SQL), the two types of functions would be quite different. f_1 would actually be a procedure, whereas f_2 would be a table-valued function. In SQL Server, the latter is returned from INFORMATION_SCHEMA.ROUTINES like so:

SELECT r.routine_schema, r.routine_name
FROM   information_schema.routines r
WHERE  r.routine_type = 'FUNCTION'
AND    r.data_type = 'TABLE'

In PostgreSQL, this doesn't work, however. The following query shows that there is essentially no difference between the signatures of f_1 and f_2:

SELECT r.routine_name, r.data_type, p.parameter_name, p.data_type
FROM   information_schema.routines r
JOIN   information_schema.parameters p
USING (specific_catalog, specific_schema, specific_name);

The above yields:

routine_name | data_type | parameter_name | data_type
-------------+-----------+----------------+----------
f_1          | integer   | v1             | integer
f_1          | integer   | v2             | integer
f_2          | integer   | v1             | integer
f_2          | integer   | v2             | integer

Things don't get much better when I have multiple columns returned from the functions, in case of which I don't even have a "formal" return type anymore. Just record:

CREATE OR REPLACE FUNCTION f_3 (v1 INTEGER, v2 OUT INTEGER, v3 OUT INTEGER)
AS $$
BEGIN
    v2 := v1;
END
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION f_4 (v1 INTEGER)
RETURNS TABLE(v2 INTEGER, v3 INTEGER)
AS $$
BEGIN
    v2 := v1;
END
$$ LANGUAGE plpgsql;

... I'll get:

routine_name | data_type | parameter_name | data_type
-------------+-----------+----------------+----------
f_3          | record    | v1             | integer
f_3          | record    | v2             | integer
f_3          | record    | v3             | integer
f_4          | record    | v1             | integer
f_4          | record    | v2             | integer
f_4          | record    | v3             | integer

If coming from other databases, clearly the intent of the lexical signature is quite different. As an Oracle person, I expect PROCEDURES to have side-effects, whereas FUNCTIONS don't have any side-effects (unless in an autonomous transaction) and can be safely embedded in SQL. I know that PostgreSQL cleverly treats all functions as tables, but I don't think it's a good idea to design OUT parameters as table columns in any query...

My question is:

Is there any formal difference at all between the two ways to declare functions? If there is, how can I discover it from the INFORMATION_SCHEMA or from the PG_CATALOG?

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509

3 Answers3

3

\df public.f_* does this

select
    n.nspname as "Schema",
    p.proname as "Name",
    pg_catalog.pg_get_function_result(p.oid) as "Result data type",
    pg_catalog.pg_get_function_arguments(p.oid) as "Argument data types",
    case
        when p.proisagg then 'agg'
        when p.proiswindow then 'window'
        when p.prorettype = 'pg_catalog.trigger'::pg_catalog.regtype then 'trigger'
        else 'normal'
    end as "Type"
from
    pg_catalog.pg_proc p
    left join
    pg_catalog.pg_namespace n on n.oid = p.pronamespace
where
    p.proname ~ '^(f_.*)$'
    and n.nspname ~ '^(public)$'
order by 1, 2, 4;

which returns this

                                          List of functions
 Schema | Name |       Result data type        |            Argument data types             |  Type  
--------+------+-------------------------------+--------------------------------------------+--------
 public | f_1  | integer                       | v1 integer, OUT v2 integer                 | normal
 public | f_2  | TABLE(v2 integer)             | v1 integer                                 | normal
 public | f_3  | record                        | v1 integer, OUT v2 integer, OUT v3 integer | normal
 public | f_4  | TABLE(v2 integer, v3 integer) | v1 integer                                 | normal
(4 rows)

To drop a function it is necessary to pass its input (IN and INOUT) arguments data types. Then I guess the function name and its input arguments data types do form its signature. And to change the returned data type it is necessary to first drop it and recreate.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • Very nice. I suspect his goes in the same direction as [what I've discovered](http://stackoverflow.com/a/24513067/521799) about the `pg_proc.proretset` column. If psql formally runs this type of query, my question can be answered positively. The two function specifications are not the same. – Lukas Eder Jul 01 '14 at 14:59
  • @Lukas Edit about function signature – Clodoaldo Neto Jul 01 '14 at 15:04
  • 1
    `OUT` parameters are not part of the function signature. You don not need them to drop a function. – Erwin Brandstetter Jul 01 '14 at 15:08
  • @ClodoaldoNeto: Do you happen to know how `pg_get_function_result` is implemented? – Lukas Eder Jul 01 '14 at 16:00
  • To be precise: `IN` and `INOUT` arguments. I added a bit to my answer. – Erwin Brandstetter Jul 01 '14 at 16:14
  • @LukasEder: [Details for `pg_get_function_result(func_oid)`](http://www.postgresql.org/docs/current/interactive/functions-info.html#FUNCTIONS-INFO-CATALOG-TABLE) – Erwin Brandstetter Jul 01 '14 at 16:17
  • @ErwinBrandstetter: That explains *what* it does, but I'm really interested in a query that can help me formally reverse-engineer the table type. Ideally, I'd have output more or less compatible to `INFORMATION_SCHEMA.COLUMNS`, like SQL Server (I'm doing this to generate Java code with [jOOQ](http://www.jooq.org)). It's starting to get a bit out-of-scope for this particular Stack Overflow question, I know – Lukas Eder Jul 01 '14 at 16:25
  • @Lukas `pg_get_function_result` is internal. Only browsing the source will reveal it. – Clodoaldo Neto Jul 01 '14 at 16:25
  • @ClodoaldoNeto: Yep. That's what I feared :) – Lukas Eder Jul 01 '14 at 16:26
2

It appears that the pg_catalog.pg_proc.proretset flag contains a hint about whether the function returns a set (i.e. a table):

SELECT   r.routine_name, r.data_type, p.parameter_name, 
         p.data_type, pg_p.proretset
FROM     information_schema.routines r
JOIN     information_schema.parameters p
USING   (specific_catalog, specific_schema, specific_name)
JOIN     pg_namespace pg_n
ON       r.specific_schema = pg_n.nspname
JOIN     pg_proc pg_p
ON       pg_p.pronamespace = pg_n.oid
AND      pg_p.proname = r.routine_name
WHERE    r.routine_schema = 'public'
AND      r.routine_name IN ('f_1', 'f_2', 'f_3', 'f_4')
ORDER BY routine_name, parameter_name;

The above would yield:

routine_name | data_type | parameter_name | data_type | proretset
-------------+-----------+----------------+-----------+----------
f_1          | record    | v1             | integer   | f
f_1          | record    | v2             | integer   | f
f_2          | record    | v1             | integer   | t
f_2          | record    | v2             | integer   | t
f_3          | record    | v1             | integer   | f
f_3          | record    | v2             | integer   | f
f_3          | record    | v3             | integer   | f
f_4          | record    | v1             | integer   | t
f_4          | record    | v2             | integer   | t
f_4          | record    | v3             | integer   | t

INFORMATION_SCHEMA.COLUMNS emulation

For what it's worth and in case someone needs this crazy thing, here's the beautiful query I came up with to emulate SQL Server's nice INFORMATION_SCHEMA.COLUMNS implementation that returns table-valued function columns (which is what we really needed when supporting table-valued functions in jOOQ's code generator):

SELECT 
  p.proname AS TABLE_NAME,
  columns.proargname AS COLUMN_NAME, 
  ROW_NUMBER() OVER(PARTITION BY p.oid ORDER BY o.ordinal) AS ORDINAL_POSITION,
  format_type(t.oid, t.typtypmod) AS DATA_TYPE,
  information_schema._pg_char_max_length(t.oid, t.typtypmod) AS CHARACTER_MAXIMUM_LENGTH,
  information_schema._pg_numeric_precision(t.oid, t.typtypmod) AS NUMERIC_PRECISION,
  information_schema._pg_numeric_scale(t.oid,t.typtypmod) AS NUMERIC_SCALE,
  not(t.typnotnull) AS IS_NULLABLE
FROM pg_proc p,
LATERAL generate_series(1, array_length(p.proargmodes, 1)) o(ordinal),
LATERAL (
  SELECT 
    p.proargnames[o.ordinal], 
    p.proargmodes[o.ordinal], 
    p.proallargtypes[o.ordinal]
) columns(proargname, proargmode, proargtype),
LATERAL (
  SELECT pg_type.oid oid, pg_type.* 
  FROM pg_type 
  WHERE pg_type.oid = columns.proargtype
) t
WHERE p.proretset
AND proargmode = 't'
AND p.proname LIKE 'f%';

The above nicely returns (column names shortened for SO):

table_name | column_name  | ordinal | data_type | length | precision | scale | nullable
f_2        | v2           |       1 | integer   |        |        32 |     0 | t
f_4        | v2           |       1 | integer   |        |        32 |     0 | t
f_4        | v3           |       2 | integer   |        |        32 |     0 | t
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
2

RETURNS TABLE() is effectively the same as OUT parameters combined with RETURNS SETOF ... Without the additional SETOF keyword, a function with OUT parameters always returns a single row, while a function with RETURNS TABLE() can return 0-n rows.

Your example just happens not to exhibit the difference, due to how it's written.

This is reflected in the flag proretset of the system catalog pg_proc. The manual:

Function returns a set (i.e., multiple values of the specified data type)

The information schema view information_schema.routines is not particularly helpful in this regard. That's a watered-down standardized compromise to present information in a platform-independent form, hardly fit to portray the specifics in Postgres.

Function signature

The manual:

Two functions are considered the same if they have the same names and input argument types, ignoring any OUT parameters

Where "input" argument types inlcude IN and INOUT parameters.

These two functions are instrumental when fiddling with function definitions. The manual:

pg_get_function_arguments(func_oid) ... get argument list of function's definition (with default values)
pg_get_function_identity_arguments(func_oid) ... get argument list to identify a function (without default values)

More in these related answers:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    OK, fixed the mistake about `SETOF`. About the information schema: I am not saying anything is wrong with it. It's just not "first-hand" infiormation. It's a standardized compromise that does not reflect all Postgres-specific features. – Erwin Brandstetter Jul 01 '14 at 15:11
  • Ah, very nice! `SETOF` is another great feature to return a row type from a concrete table. That's going to be quite useful as well... Good thing you undeleted your answer. – Lukas Eder Jul 01 '14 at 15:16
  • It would be really interesting to see what those `pg_get_function_xxx` functions really do, internally. Unnesting those `pg_proc.proargxxx` columns into a meaningful SQL statement doesn't seem so straightforward... – Lukas Eder Jul 01 '14 at 16:17
  • @LukasEder: I added links to answers that produce meaningful SQL statements. – Erwin Brandstetter Jul 01 '14 at 16:24
  • I've updated my own answer with what I've meant. I was really looking for a means to emulate SQL Server's understanding of `INFORMATION_SCHEMA.COLUMNS` with respect to table-valued functions. Maybe, this will eventually make it into PostgreSQL's `INFORMATION_SCHEMA` as well...? – Lukas Eder Jul 02 '14 at 09:45