I am trying to produce a table containing counts of non-null datapoints for columns in the "Area Health Resource File" -- which contains per-county demographic and health data.
I have reworked the data into timeseries from the provided format, resulting in a bunch of tables named "series_" for some data category foo, and rows identified by county FIPS and year (initial and final for multiyear surveys).
Now want to produce counts over the timeseries columns. So far the query I have is:
do language plpgsql $$
declare
query text;
begin
query := (with cats as (
select tcategory, format('series_%s', tcategory) series_tbl
from series_categories),
cols as (
select tcategory, series_tbl, attname col
from pg_attribute a join pg_class r on a.attrelid = r.oid
join cats c on c.series_tbl = r.relname
where attname not in ('FIPS', 'initial', 'final')
and attnum >= 0
order by tcategory, col),
scols as (
select tcategory, series_tbl, col,
format('count(%s)', quote_ident(col)) sel
from cols),
sel as (
select format(
E' (select %s tcategory, %s col, %s from %s)\n',
quote_literal(tcategory), quote_literal(col), sel, series_tbl) q
from scols)
select string_agg(q, E'union\n') from sel);
execute format(
'select * into category_column_counts from (%s) x', query);
end;
$$;
(Here the "series_categories" table has category name.)
This ... "works" but is probably hundreds of times too slow. Its doing ~10,000
individual tablescans, which could be reduced 500-fold, as there are only 20-ish
categories. I would like to use select count(col1), count(col2) ...
for each table, then "unnest" these row records and concatenate all together.
I haven't figured it out though. I looked at: https://stackoverflow.com/a/14087244/435563
for inspiration, but haven't transformed that successfully.