1

I need to create a function, which returns results of a SELECT query. This SELECT query is a JOIN of few temporary tables created inside this function. Is there any way to create such function? Here is an example (it is very simplified, in reality there are multiple temp tables with long queries):

CREATE OR REPLACE FUNCTION myfunction () RETURNS TABLE (column_a TEXT, column_b TEXT) AS $$
BEGIN
CREATE TEMPORARY TABLE raw_data ON COMMIT DROP
AS
SELECT d.column_a, d2.column_b FROM dummy_data d JOIN dummy_data_2 d2 using (id);

RETURN QUERY (select distinct column_a, column_b from raw_data limit 100);
END;
$$
LANGUAGE 'plpgsql' SECURITY DEFINER

I get error:

[Error] Script lines: 1-19 -------------------------
 ERROR: RETURN cannot have a parameter in function returning set;
 use RETURN NEXT at or near "QUERY"Position: 237

I apologize in advance for any obvious mistakes, I'm new to this.

Psql version is PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1)

  • Interesting. What is the error that you are getting? – Dima Chubarov Sep 24 '17 at 13:13
  • I have typed your code into SQLFiddle and with a few syntactic tricks that might or might not be relevant to your case it seems to pass the checks and return the expected result http://sqlfiddle.com/#!15/874fa/1 – Dima Chubarov Sep 24 '17 at 13:28
  • You forgot to provide your version of Postgres, which is essential. And `cannot seem to make it run` is not an error message I have ever heard of. BTW, there are no "psql functions". Yours is a plpgsql function. Finally: what makes you think you need temp tables for this? Or even a function? – Erwin Brandstetter Sep 24 '17 at 16:23
  • hey guys! Thanks for the fast reply, added few more comments in the description, hope it helps – postgresql_beginner Sep 24 '17 at 18:58
  • That clarifies a few things! – Erwin Brandstetter Sep 24 '17 at 22:51
  • Important note - if you don't need it (statistics, indexes), don't do it! Use just `RETURN QUERY` - temporary tables are little bit expensive in Postgres (depends on load - but when load is high, then temp tables are pretty expensive). – Pavel Stehule Sep 25 '17 at 03:49
  • I feel like Willie Wonka. Don't. Stop. Don't do this. Yet you won't listen. This isn't SQL Server. Don't use functions this way. It will be slow. – Jon Roberts Sep 25 '17 at 14:45
  • Thanks for the feedback, seems like it is not possible to do with current version, but this cannot be changed. For those who argued about the approach, what would you recommend to do in this case: - Having just one single query is long (joins multiple tables) and takes a lot of time (large volume of data), quite unreadable. This is the reason behind splitting it until multiple temp tables. - Delivery (final output from the query) needs to be a function. - This function needs to be called in daily basis to get the results – postgresql_beginner Sep 26 '17 at 13:27
  • @postgresql_beginner: I added another alternative for while you are stuck in your situation. For a function that's called once a day and no concurrency or performance issues, temp tables should be ok. – Erwin Brandstetter Sep 26 '17 at 14:05

2 Answers2

2

The most recent version of Greenplum Database (5.0) is based on PostgreSQL 8.3, and it supports the RETURN QUERY syntax. Just tested your function on:

PostgreSQL 8.4devel (Greenplum Database 5.0.0-beta.10+dev.726.gd4a707c762 build dev)
A. Scherbaum
  • 809
  • 4
  • 8
1

The most probable error this could raise in Postgres:

ERROR: column "foo" specified more than once

Meaning, there is at least one more column name (other than id which is folded to one instance with the USING clause) included in both tables. This would not raise an exception in a plain SQL SELECT which tolerates duplicate output column names. But you cannot create a table with duplicate names.

The problem also applies for Greenplum (like you later declared), which is not Postgres. It was forked from PostgreSQL in 2005 and developed separately. The current Postgres manual hardly applies at all any more. Look to the Greenplum documentation.

And psql is just the standard PostgreSQL interactive terminal program. Obviously you are using the one shipped with PostgreSQL 8.2.15, but the RDBMS is still Greenplum, not Postgres.

Syntax fix (for Postgres, like you first tagged, still relevant):

CREATE OR REPLACE FUNCTION myfunction()
  RETURNS TABLE (column_a text, column_b text) AS
$func$
BEGIN
   CREATE TEMPORARY TABLE raw_data ON COMMIT DROP AS
   SELECT d.column_a, d2.column_b  -- explicit SELECT list avoids duplicate column names
   FROM   dummy_data   d
   JOIN   dummy_data_2 d2 using (id);

   RETURN QUERY
   SELECT DISTINCT column_a, column_b
   FROM   raw_data
   LIMIT  100;
END
$func$  LANGUAGE plpgsql SECURITY DEFINER;

The example wouldn't need a temp table - unless you access the temp table after the function call in the same transaction (ON COMMIT DROP). Else, a plain SQL function is better in every way. Syntax for Postgres and Greenplum:

CREATE OR REPLACE FUNCTION myfunction(OUT column_a text, OUT column_b text)
  RETURNS SETOF record AS
$func$
   SELECT DISTINCT d.column_a, d2.column_b
   FROM   dummy_data   d
   JOIN   dummy_data_2 d2 using (id)
   LIMIT  100;
$func$  LANGUAGE plpgsql SECURITY DEFINER;

Not least, it should also work for Greenplum.

The only remaining reason for this function is SECURITY DEFINER. Else you could just use the simple SQL statement (possibly as prepared statement) instead.

RETURN QUERY was added to PL/pgSQL with version 8.3 in 2008, some years after the fork of Greenplum. Might explain your error msg:

ERROR: RETURN cannot have a parameter in function returning set;
use RETURN NEXT at or near "QUERY" Position: 237

Aside: LIMIT without ORDER BY produces arbitrary results. I assume you are aware of that.


If for some reason you actually need temp tables and cannot upgrade to Greenplum 5.0 like A. Scherbaum suggested, you can still make it work in Greenplum 4.3.x (like in Postgres 8.2). Use a FOR loop in combination with RETURN NEXT.
Examples:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228