Select a dynamic set of columns from a table and get the sum for each

Question

Is it possible to do the following in Postgres:

SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer';

SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport;

In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table.

I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each column returned from the information schema query. Eg:

FOR r IN select column_name from information_schema where report_view_name = 'somereport' and data_type = 'integer';
LOOP
    SELECT SUM(r.column_name) FROM somereport;
END

I don't know of a way to do this. However if the round-trip times involved in multiple `SELECT` queries is a concern, and if you're submitting SQL statements to the DB from some host language, I would suggest using a FOR loop in your host language to build up a single `SELECT` query containing all the desired columns. This way you only need 2 queries in total (one to extract the list of desired columns, and one to get the totals). — j_random_hacker, Aug 19 '12 at 18:33
One thing is for certain: there is no way to write a `SELECT` query in standard SQL so that the *number of columns* depends on the contents of another table. — j_random_hacker, Aug 19 '12 at 18:35
@j_random_hacker is correct. You need to build your query *dynamically* for that. — Erwin Brandstetter, Aug 20 '12 at 01:01
You may also be interested in this [related answer dealing with varying return types](http://stackoverflow.com/a/11751557/939860). — Erwin Brandstetter, Aug 21 '12 at 04:45

score 4 · Accepted Answer · edited May 23 '17 at 12:20

This query creates the complete DML statement you are after:

WITH x AS (
   SELECT 'public'::text     AS _schema  -- provide schema name ..
         ,'somereport'::text AS _tbl     -- .. and table name once
   )
SELECT 'SELECT ' || string_agg('sum(' || quote_ident(column_name)
                 || ') AS sum_' || quote_ident(column_name), ', ')
       || E'\nFROM   ' || quote_ident(x._schema) || '.' || quote_ident(x._tbl)
FROM   x, information_schema.columns
WHERE  table_schema = _schema
AND    table_name = _tbl
AND    data_type = 'integer'
GROUP  BY x._schema, x._tbl;

You can execute it separately or wrap this query in a plpgsql function and run the query automatically with EXECUTE:

Full automation

Tested with PostgreSQL 9.1.4

CREATE OR REPLACE FUNCTION f_get_sums(_schema text, _tbl text)
  RETURNS TABLE(names text[], sums bigint[]) AS
$BODY$
BEGIN

RETURN QUERY EXECUTE (
    SELECT 'SELECT ''{'
           || string_agg(quote_ident(c.column_name), ', ' ORDER BY c.column_name)
           || '}''::text[],
           ARRAY['
           || string_agg('sum(' || quote_ident(c.column_name) || ')'
                                                   , ', ' ORDER BY c.column_name)
           || ']
    FROM   '
           || quote_ident(_schema) || '.' || quote_ident(_tbl)
    FROM   information_schema.columns c
    WHERE  table_schema = _schema
    AND    table_name = _tbl
    AND    data_type = 'integer'
    );

END;
$BODY$
  LANGUAGE plpgsql;

Call:

SELECT unnest(names) AS name, unnest (sums) AS col_sum
FROM   f_get_sums('public', 'somereport');

Returns:

   name        | col_sum
---------------+---------
 int_col1      |    6614
 other_int_col |    8364
 third_int_col | 2720642

Explain

The difficulty is to define the RETURN type for the function, while number and names of columns returned will vary. One detail that helps a little: you only want integer columns.

I solved this by forming an array of bigint (sum(int_col) returns bigint). In addition I return an array of column names. Both sorted alphabetically by column name.

In the function call I split up these arrays with unnest() arriving at the handsome format displayed.

The dynamically created and executed query is advanced stuff. Don't get confused by multiple layers of quotes. Basically you have EXECUTE that takes a text argument containing the SQL query to execute. This text, in turn, is provided by secondary SQL query that builds the query string of the primary query.

If this is too much at once or plpgsql is rather new for you, start with this related answer where I explain the basics dealing with a much simpler function and provide links to the manual for the major features.

If performance is essential query the Postgres catalog directly (pg_catalog.pg_attributes) instead of using the standardized (but slow) information_schema.columns. Here is a simple example with pg_attributes.

This is genius. Thank you. I should be fine implementing this stuff, I've been messing with functions for a few weeks now, just don't know them inside out. There also a few things you have done here which will help me with other problems. Mainly returning dynamic amounts of columns using the array and unnest() features. Never seen that before! I will report back my findings later on today. — Aydin Hassan, Aug 20 '12 at 06:38
I think this method could work with this problem http://stackoverflow.com/questions/9467250/dynamic-column-names-in-view-postgres — Aydin Hassan, Aug 20 '12 at 06:53
Great answer. `unnest()` is new to me -- looks like it could save on some awkward `UNION ALL` clauses I've used to build sets of rows in the past! — j_random_hacker, Aug 20 '12 at 07:31

Select a dynamic set of columns from a table and get the sum for each

1 Answers1

Full automation

Explain

Linked