How to perform the same aggregation on every column, without listing the columns?

Question

I have a table with N columns. Let's call them c1, c2, c3, c4, ... cN. Among multiple rows, I want to get a single row with COUNT DISTINCT(cX) for each X in [1, N].

c1 | c2 | ... | cn
0  | 4  | ... | 1

Is there a way I can do this (in a stored procedure) without writing every column name into the query manually?

Why?

We've had a problem where bugs in application servers mean we rewrite good column values with garbage inserted later. To solve this, I'm storing the information log-structure, where each row represents a logical UPDATE query. Then, when given a signal that the record is complete, I can determine if any values were (erroneously) overwritten.

An example of a single correct record in multiple rows: there is at most one value for each column.

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 12:00am         | NULL       | 2:00pm   |

Reconciled row:
| 1  | 12:00am         | 1:00pm     | 2:00pm   |

An example of an irreconcilable record that I want to detect:

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 9:00am          | 1:00pm     | 2:00pm   |   -- New initialize time => irreconcilable!

Are you asking how to fetch the column names, or how to do the aggregation? — Burhan Khalid, Dec 07 '12 at 09:29
I know how to fetch the column names, but I don't know what to do to expand an expression around each column name in the query. I want to avoid `SELECT COUNT DISTINCT(c1), COUNT DISTINCT(c2), ..., COUNT DISTINCT(cN) FROM ...` in my stored procedure. — Andres Jaan Tack, Dec 07 '12 at 09:31
You can do this with dynamic SQL, of course. By the way, how big is X? — dezso, Dec 07 '12 at 09:44
I would have expected that writing that statement for 30 columns would have taken less time than posting this question. — , Dec 07 '12 at 12:34
Personally I'd just add an audit trigger that logged changes to the table, fix the bugs in the application server, and not worry about this over-complex workaround. See http://wiki.postgresql.org/wiki/Audit_trigger_91plus — Craig Ringer, Dec 07 '12 at 12:43

Erwin Brandstetter · Accepted Answer · 2022-06-14T02:17:39.320

You need dynamic SQL for that, which means you have to create a function or run a DO command. Since you cannot return values directly from the latter, a plpgsql function it is:

CREATE OR REPLACE function f_count_all(_tbl text
                                     , OUT columns text[]
                                     , OUT counts bigint[])
  RETURNS record LANGUAGE plpgsql AS
$func$
BEGIN
   EXECUTE (
       SELECT 'SELECT
         ARRAY[' || string_agg(''''     || quote_ident(attname) || '''', ', ') || '] 
       , ARRAY[' || string_agg('count(' || quote_ident(attname) || ')' , ', ') || ']
       FROM ' || _tbl
       FROM   pg_attribute
       WHERE  attrelid = _tbl::regclass
       AND    attnum  >= 1           -- exclude tableoid & friends (neg. attnum)
       AND    NOT attisdropped       -- exclude deleted columns
       GROUP  BY attrelid
       )
   INTO columns, counts;
END
$func$;

Call:

SELECT * FROM f_count_all('myschema.mytable');

Returns:

columns       | counts
--------------+--------
{c1, c2, c3}  | {17, 1, 0}

More explanation and links about dynamic SQL and EXECUTE in this related question - or a couple more here on SO, try this search.

Count values for every column in a table

You could even try and return a polymorphic record type to get single columns dynamically, but that's rather complex and advanced. Probably too much effort for your case. More in this related answer.

How to perform the same aggregation on every column, without listing the columns?

Why?

1 Answers1

Linked