5

I have a table with N columns. Let's call them c1, c2, c3, c4, ... cN. Among multiple rows, I want to get a single row with COUNT DISTINCT(cX) for each X in [1, N].

c1 | c2 | ... | cn
0  | 4  | ... | 1

Is there a way I can do this (in a stored procedure) without writing every column name into the query manually?

Why?

We've had a problem where bugs in application servers mean we rewrite good column values with garbage inserted later. To solve this, I'm storing the information log-structure, where each row represents a logical UPDATE query. Then, when given a signal that the record is complete, I can determine if any values were (erroneously) overwritten.

An example of a single correct record in multiple rows: there is at most one value for each column.

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 12:00am         | NULL       | 2:00pm   |

Reconciled row:
| 1  | 12:00am         | 1:00pm     | 2:00pm   |

An example of an irreconcilable record that I want to detect:

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 9:00am          | 1:00pm     | 2:00pm   |   -- New initialize time => irreconcilable!
Andres Jaan Tack
  • 22,566
  • 11
  • 59
  • 78
  • Are you asking how to fetch the column names, or how to do the aggregation? – Burhan Khalid Dec 07 '12 at 09:29
  • I know how to fetch the column names, but I don't know what to do to expand an expression around each column name in the query. I want to avoid `SELECT COUNT DISTINCT(c1), COUNT DISTINCT(c2), ..., COUNT DISTINCT(cN) FROM ...` in my stored procedure. – Andres Jaan Tack Dec 07 '12 at 09:31
  • You can do this with dynamic SQL, of course. By the way, how big is X? – dezso Dec 07 '12 at 09:44
  • The total number of columns _N_ is ~30. – Andres Jaan Tack Dec 07 '12 at 09:46
  • I would have expected that writing that statement for 30 columns would have taken less time than posting this question. –  Dec 07 '12 at 12:34
  • 1
    Personally I'd just add an audit trigger that logged changes to the table, fix the bugs in the application server, and not worry about this over-complex workaround. See http://wiki.postgresql.org/wiki/Audit_trigger_91plus – Craig Ringer Dec 07 '12 at 12:43

1 Answers1

4

You need dynamic SQL for that, which means you have to create a function or run a DO command. Since you cannot return values directly from the latter, a plpgsql function it is:

CREATE OR REPLACE function f_count_all(_tbl text
                                     , OUT columns text[]
                                     , OUT counts bigint[])
  RETURNS record LANGUAGE plpgsql AS
$func$
BEGIN
   EXECUTE (
       SELECT 'SELECT
         ARRAY[' || string_agg(''''     || quote_ident(attname) || '''', ', ') || '] 
       , ARRAY[' || string_agg('count(' || quote_ident(attname) || ')' , ', ') || ']
       FROM ' || _tbl
       FROM   pg_attribute
       WHERE  attrelid = _tbl::regclass
       AND    attnum  >= 1           -- exclude tableoid & friends (neg. attnum)
       AND    NOT attisdropped       -- exclude deleted columns
       GROUP  BY attrelid
       )
   INTO columns, counts;
END
$func$;

Call:

SELECT * FROM f_count_all('myschema.mytable');

Returns:

columns       | counts
--------------+--------
{c1, c2, c3}  | {17, 1, 0}

More explanation and links about dynamic SQL and EXECUTE in this related question - or a couple more here on SO, try this search.

Related:

You could even try and return a polymorphic record type to get single columns dynamically, but that's rather complex and advanced. Probably too much effort for your case. More in this related answer.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228