I have a table in my Postgres database that I'm trying to determine fill rates for (that is, I'm trying to understand how often data is/isn't missing). I need to make a function that, for each column (in a list of a couple dozen columns I've selected), counts the number and percentage of columns with non-null values.
The problem is, I don't really know how to iterate through a list of columns in a programmatic way, because I don't know how to reference a column from a string of its name. I've read about how you can use the EXECUTE
command to run dynamically-written SQL, but I haven't been able to get it to work. Here's my current function:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOR i IN array_lower(fields, 1) .. array_upper(fields, 1)
LOOP
field_name := fields[i];
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE $1 IS NOT NULL' INTO fill_count USING field_name;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM get_fill_rates() ORDER BY fill_count DESC;
This function, as written, returns every field as having a 100% fill rate, which I know to be false. How can I make this function work?