2

Currently using Postgres 9.1. I'm looking for a way to get the count for every column in a specific table. It has around 600 columns and looks like this:

CREATE TABLE geoproject.mes_wastab (
  invid character varying(16)
, invtype character varying(3)
, smpid integer
, smpname character varying(40)
, smpdate date
, smptime character varying(5)
, o2st_p double precision
, wspgokna double precision
, o2_p double precision
, absenkungs double precision
, klarstrom3 double precision
, ...
);

Column names should be queried from information_schema.columns. Not-null values should be counted. The result should look like this:

column_name : count

Can I query this with SQL, or do I need a function?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user1288241
  • 43
  • 1
  • 4

2 Answers2

4

This query will create the DML statement to get what you want.

SELECT 'SELECT ' || string_agg('count(' || quote_ident(attname) || ')', ', ')
    || 'FROM '   || attrelid::regclass
FROM   pg_attribute
WHERE  attrelid = 'mytbl'::regclass
AND    attnum  >= 1           -- exclude tableoid & friends (neg. attnum)
AND    attisdropped is FALSE  -- exclude deleted columns
GROUP  BY attrelid;

Returns:

SELECT count(col1), count(col2), count(col3), ...
FROM   mytbl

You can automatically execute it, too. But not in plan SQL, you need EXECUTE in a plpgsql function or DO statement (PostgreSQL 9.0 or later) for that.

You also need Postgres 9.0 or later for the string_agg() function. In older versions, you can substitute: array_to_string(array_agg(...), ', ').

You may wonder about the special cast 'mytbl'::regclass. Read more about object identifier types in the manual.

BTW: NULL values do not add to COUNT(col) by default.

Substitute the (schema-qualified) table name for mytbl. In your case that should be:

...
WHERE  attrelid = 'geoproject.mes_wastab'::regclass
...

If you should be using mixed case or otherwise messed up identifiers (note the quotes):

...
WHERE  attrelid = '"gEopRoject"."MES_wastab"'::regclass
...
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for your answer. But when I execute it I get an empty result. – user1288241 Mar 23 '12 at 15:37
  • @user1288241: Works for me, I tested before posting. You are aware that you need to substitute your (schema-qualified, if necessary) table name for `mytbl`? – Erwin Brandstetter Mar 23 '12 at 15:54
  • yes I did it. I have an postgresql 9.1 and the table has around 600 columns. The table looks like this: CREATE TABLE geoproject.mes_wastab ( invid character varying(16), invtype character varying(3), smpid integer, smpname character varying(40), smpdate date, smptime character varying(5), o2st_p double precision, wspgokna double precision, o2_p double precision, absenkungs double precision, klarstrom3 double precision, klarstrom2 double precision, klarstrom4 double precision, ........................ ....... ) WITH ( OIDS=FALSE ); Any idea? – user1288241 Mar 23 '12 at 16:05
  • @user1288241: It is recommended that you edit substantial information into your question. That is also much easier to read. Just click "edit" under your question. – Erwin Brandstetter Mar 23 '12 at 16:14
1

I wanted a dynamic solution for this problem, so, using the query proposed by @Erwin, here is a function that only needs the table and schema names and outputs a table like:

columns, percentage
------------------
colname1, perc1
colname2, perc2
...
colnamen, percn
CREATE OR REPLACE FUNCTION public.completeness_histogram(_tabella text, _schema text)
 RETURNS TABLE(columns text, percentage numeric)
 LANGUAGE plpgsql
AS $function$
declare 
    seed_query text;
    col_list text;
    intermediate_query text;
    final_query text;
begin

    SELECT 'SELECT ' || string_agg(concat('round(100 * count(', col
                      , ') / count(*)::numeric, 2) AS ', col_pct), E'\n     , ')
        || E'\nFROM   ' ||  tbl into seed_query
    FROM (
       SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS tbl
            , quote_ident(column_name) AS col
            , quote_ident(column_name) AS col_pct
       FROM   information_schema.columns
       WHERE  table_name = _tabella
       and    table_schema = _schema
       ORDER  BY ordinal_position
       ) sub
    GROUP  BY tbl;

    select string_agg(col_pct, ', ') into col_list
    from (
       SELECT quote_ident(column_name) AS col_pct
       FROM   information_schema.columns
       WHERE  table_name = _tabella
       and table_schema = _schema
       ORDER  BY ordinal_position
    ) foo;

    
    intermediate_query := format('SELECT ''SELECT * FROM unnest(
      ''''{%s}''''::text[]
    , '' || string_agg(quote_literal(ARRAY[%s])
                  || ''::numeric[]'', E''\n, '')
        || E'') \n AS t(col, completezza)'' AS sql
    FROM   (
        %s
    ) foo;', col_list, col_list, seed_query);

    execute format(intermediate_query)  into final_query;

    return query execute format(final_query);
    
end;
$function$
;
Sotis
  • 176
  • 1
  • 9