3

Using Postgres 13.1, I want to apply a forward fill function to all columns of a table. The forward fill function is explained in my earlier question:

However, in that case the columns and table are specified. I want to take that code and apply it to an arbitrary table, ie. specify a table and the forward fill is applied to each of the columns.

Using this table as an example:

CREATE TABLE example(row_num int, id int, str text, val integer);
INSERT INTO example VALUES
  (1, 1, '1a', NULL)
, (2, 1, NULL,    1)
, (3, 2, '2a',    2)
, (4, 2, NULL, NULL)
, (5, 3, NULL, NULL)
, (6, 3, '3a',   31)
, (7, 3, NULL, NULL)
, (8, 3, NULL,   32)
, (9, 3, '3b', NULL)
, (10,3, NULL, NULL)
;

I start with the following working base for the function. I call it passing in some variable names. Note the first is a table name not a column name. The function takes the table name and creates an array of all the column names and then outputs the names.

create or replace function col_collect(tbl text, id text, row_num text)
    returns text[]
    language plpgsql as
$func$
declare
    tmp text[];
    col text;
begin
    select array (
            select column_name
            from information_schema."columns" c
            where table_name = tbl
            ) into tmp;
    foreach col in array tmp
    loop
        raise notice 'col: %', col;
    end loop;
    return tmp;
end
$func$;

I want to apply the "forward fill" function I got from my earlier question to each column of a table. UPDATE seems to be the correct approach. So this is the preceding function where I replace raise notice by an update using execute so I can pass in the table name:

create or replace function col_collect(tbl text, id text, row_num text)
    returns void
    language plpgsql as
$func$
declare
    tmp text[];
    col text;
begin
    select array (
            select column_name
            from information_schema."columns" c
            where table_name = tbl
            ) into tmp;
    foreach col in array tmp
    loop
        execute 'update '||tbl||' 
                set '||col||' = gapfill('||col||') OVER w AS '||col||' 
                where '||tbl||'.row_num = '||col||'.row_num
                window w as (PARTITION BY '||id||' ORDER BY '||row_num||') 
                returning *;';
    end loop;
end
$func$;

-- call the function
select col_collect('example','id','row_num')

The preceding errors out with a syntax error. I have tried many variations on this but they all fail. Helpful answers on SO were here and here. The aggregate function I'm trying to apply (as window function) is:

CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
  RETURNS anyelement
  LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s);  -- that's all!
END
$func$;

CREATE AGGREGATE gap_fill(anyelement) ( 
  SFUNC = gap_fill_internal, 
  STYPE = anyelement 
);

My questions are:

  1. is this a good approach and if so what am I doing wrong; or
  2. is there a better way to do this?
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
MikeB2019x
  • 823
  • 8
  • 23
  • After reading this twice I'm still not clear what is the objective of the overall functions. Filling gap? But with what? How do you decide what is the type of column (i.e what would be a gap in a varchar column)?. One comment if you have a function that just do a `coalesce` you dont need that function, just use `coalesce` directly, makes no sense. Whether this is a good approach or not I would say this is looking like an [XY Problem](https://xyproblem.info/) – Jorge Campos Feb 09 '22 at 22:02
  • 1
    @JorgeCampos: To understand this question, you probably have to start with its precursor linked at the top: https://stackoverflow.com/q/70987689/939860 – Erwin Brandstetter Feb 09 '22 at 22:20

1 Answers1

5

What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.

That said, let's dive in. Buckle up!

CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _pk  text  := quote_ident(_row_num);
   _sql text;
BEGIN   
   SELECT INTO _sql, nullable_columns
          concat_ws(E'\n'
          , 'UPDATE ' || _tbl || ' t'
          , 'SET   (' || string_agg(        quote_ident(a.attname), ', ') || ')'
          , '    = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
          , 'FROM  (' 
          , '   SELECT ' || _pk
          , '        , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
          , '   FROM   ' || _tbl
          , format('   WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
          , '   ) u'
          , format('WHERE t.%1$s = u.%1$s', _pk)
          , 'AND  (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
          , '     (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
          )
        , count(*) -- AS _col_ct
   FROM  (
      SELECT a.attname
      FROM   pg_attribute a
      WHERE  a.attrelid = _tbl
      AND    a.attnum > 0
      AND    NOT a.attisdropped
      AND    NOT a.attnotnull
      ORDER  BY a.attnum
      ) a;

   IF nullable_columns = 0 THEN
      RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
   ELSIF _sql IS NULL THEN
      RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
   END IF;
   
   -- RAISE NOTICE '%', _sql;       -- debug
   EXECUTE _sql;              -- execute
   GET DIAGNOSTICS updated_rows = ROW_COUNT; 
END
$func$;

Example call:

SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');

db<>fiddle here

The function is state of the art. Generates and executes a query of the form:

UPDATE tbl t
SET   (str, val, col1)
    = (u.str, u.val, u.col1)
FROM  (
   SELECT row_num
        , gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
        , gap_fill(col1) OVER w AS col1
   FROM   tbl
   WINDOW w AS (PARTITION BY id ORDER BY row_num)
   ) u
WHERE t.row_num = u.row_num
AND  (t.str, t.val, t.col1) IS DISTINCT FROM
     (u.str, u.val, u.col1)

Using pg_catalog.pg_attribute instead of the information schema. See:

Note the final WHERE clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:

Moreover, only nullable columns (not defined NOT NULL) will even be considered, to avoid unnecessary work.

Using ROW syntax in UPDATE to keep the code simple. See:

The function returns two integer values: nullable_columns and updated_rows, reporting what the names suggest.

The function defends against SQL injection properly. See:

About GET DIAGNOSTICS:

The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:

CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
  RETURNS SETOF anyelement
  LANGUAGE plpgsql AS
$func$
DECLARE
   _tbl regclass := pg_typeof(_tbl_type)::text::regclass;
   _sql text;
BEGIN   
   SELECT INTO _sql
          'SELECT ' || string_agg(CASE WHEN a.attnotnull
                                  THEN format('%I', a.attname)
                                  ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
                                , ', ' ORDER BY a.attnum)
        || E'\nFROM ' || _tbl
        || format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
   FROM   pg_attribute a
   WHERE  a.attrelid = _tbl
   AND    a.attnum > 0
   AND    NOT a.attisdropped;
   
   IF _sql IS NULL THEN
      RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
   END IF;

   RETURN QUERY EXECUTE _sql;
   -- RAISE NOTICE '%', _sql;       -- debug
END
$func$;

Call (note special syntax!):

SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');

db<>fiddle here

About returning a polymorphic row type:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • @Erwin Brandstetter https://dbfiddle.uk/?rdbms=postgres_14&fiddle=5e4eadb7700424d443c2bad2a9049e26 I am not getting the `RETURN COALESCE(v, s)` and aggregate function. The precursor link I can understand. – jian Feb 15 '22 at 16:25
  • @Mark: The aggregate function `gap_fill_internal()` returns the current value if it's not null, else the value from the previous row. (If arguments were not switched, we could use `COALESCE` directly.) – Erwin Brandstetter Feb 15 '22 at 17:23
  • `'SELECT ' || string_agg(CASE WHEN a.attnotnull THEN format('%I', a.attname) ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END , ', ' ORDER BY a.attnum)` since a.attnotnull is true, How come else clause also get evaluated? @ErwinBrandstetter – jian Feb 16 '22 at 09:44