How to extract the common behavior of removing extremes?

Question

For some of the reports I need to remove extremely low and high values from consideration with something like this

SELECT ...
FROM
(
  SELECT val, ntile(10) OVER(ORDER BY val) AS tile FROM table
) AS tiled_table
WHERE tile > 1 AND tile < 10

It could be extracted into stored procedure that will take table name and column name, concatenate the strings and execute the query but sometimes I need to use this procedure on a result of another query. Is there some way to do it in PostgreSQL?

score 1 · Answer 1 · answered Oct 25 '12 at 14:56

1

To solve this problem, you will need dynamic SQL. In general, you cannot have a dynamic table name in a query. Refer to http://www.postgresql.org/docs/9.2/static/ecpg-dynamic.html, for information on dynamic SQL.

If you construct the query as:

FROM
(
  SELECT val, ntile(10) OVER(ORDER BY val) AS tile FROM <subquery> t
) AS tiled_table

Then this wil work when the subquery is enclosed in parentheses. You can add the parens if they are not in the original query.

answered Oct 25 '12 at 14:56

Gordon Linoff

1,242,037
58
646
786

There are drawbacks though — if you make a function like this you'll not be able to use prepared statements in your query and you'll need to quote your query parameters in client, which is easy to mess up. – Tometzky Oct 25 '12 at 15:03
@Tometzky Yep. It's a potential giant SQL injection security hole that requires the client to be very paranoid about identifier quoting, etc. – Craig Ringer Oct 26 '12 at 00:01

score 1 · Answer 2 · 2012-10-25T15:11:03.310

1

If you always return the same set of columns, you can create a set returning function that you pass table and column name. That can be used just like a table:

create or replace function get_values(tablename text, columnname text)
  returns table (id integer, foobar text)
as
$$
BEGIN
  RETURN QUERY EXECUTE 'select id, '||columname||' as foobar from '||tablename;
END;
$$
language plpgsql;

Then whenever you need the values, you use:

select v.*,
       t.foobar
from get_values('table_1', 'some_column') v
  join table_2 t on ...

If your query returns a different number of columns eacht time you need it, this will not work.

edited Oct 25 '12 at 15:11

answered Oct 25 '12 at 15:04

This (string concatenation) is exactly what I'm trying to avoid. – synapse Oct 25 '12 at 18:27
@synapse: if you want to specify the column and table name as a parameter I don't see any way around dynamic SQL (unless you have a **very** limited set of choice in which case you might get away with a simple `if` cascade and "hardcoded" statements. But given the nature of your question, why do you want to avoid dynamic SQL? – Oct 25 '12 at 18:28
@synapse If you need dynamic SQL - and if you want dynamic column or table names you need it - then you're pretty much stuck with string concatenation. Rather than using `||quote_ident(...)` though, I'd advise using `format` with `%I`. See http://stackoverflow.com/a/12995424/398670 for an example. – Craig Ringer Oct 25 '12 at 23:59

score 0 · Accepted Answer · edited May 23 '17 at 12:27

If you want to use a function to process result sets from a query, your best option is to SELECT ... INTO TEMPORARY TABLE then call the function with the temporary table name.

It's annoyingly difficult to consume a rowset in a PostgreSQL function given how trivial rowsets are to produce. The only ways I'm aware of to do it are by consuming a refcursor, processing a temporary table, or by implementing an aggregate or window function. The latter two options don't allow you control over how many rows get returned, so they aren't suitable for your purposes.

Functions cannot refer to common table expression aliases in CTEs that call the function, so you can't use a CTE to create a virtual table and pass the name of the table to the function. Example showing it doesn't work:

CREATE OR REPLACE FUNCTION dynsql(tname text, colname text) RETURNS SETOF RECORD AS 
$$
BEGIN
  RETURN QUERY EXECUTE format('SELECT %I FROM %I', colname, tname);
END;
$$ LANGUAGE plpgsql;

WITH dummy(col) AS (VALUES (1),(2),(3))
SELECT * FROM dynsql('dummy','col') t(id integer);

Result:

ERROR:  relation "dummy" does not exist

... because the alias in a WITH expression is local to the WITH expression. (It'd be nice to be able to refer to it from functions, but that'd also create all sorts of exciting name clash problems and security issues with SECURITY DEFINER functions.)

While you can write a PL/PgSQL function that consumes a refcursor, this requires you to DECLARE a cursor with a query and pass it to the function. You can't just use ordinary function call syntax. It's also quite inefficient and requires LOOPing in the function. I don't think it'd be much help.

When implementing the function, use EXECUTE format(...) USING ... to keep the dynamic SQL not-too-horrible. See this earlier answer.

How to extract the common behavior of removing extremes?

3 Answers3