3

In the Postgres table named my_table, I want to set all empty strings ('') across all variables to null. I have the following do-block which fails at the execute line.

I am quite new to plpgsql and do not understand why. How can I properly execute the command stored in q?

do language plpgsql
$$
DECLARE
  r record;
  q text;
BEGIN
   for r in
        select table_name, column_name
        from information_schema.columns t
        where t.table_name = 'my_table'
  loop
    q := format('UPDATE %s SET %s = NULL WHERE %s = '''';',
                    r.table_name, r.column_name, r.column_name);
    raise notice 'cleaning column: %', q;
    execute q;  -- this line fails
   end loop;
END;
$$;

PS. Any other hints for better code are also welcome :)

Mark Heckmann
  • 10,943
  • 4
  • 56
  • 88

1 Answers1

2

For a non-text column (say integer id) the query

UPDATE id SET id = NULL WHERE id = ''

will raise an error as you cannot compare integer id to text.

Use cast to text:

...
q := format('UPDATE %I SET %I = NULL WHERE %I::text = '''';',
            r.table_name, r.column_name, r.column_name);
...

As an alternative you can execute the update query only for text columns, e.g.:

...
    select table_name, column_name
    from information_schema.columns t
    where t.table_name = 'my_table'
    and data_type in ('text', 'character varying')
...
klin
  • 112,967
  • 15
  • 204
  • 232
  • 1
    This answer should be extended. First, the column could be defined as `NOT NULL`, which would cause an error. Second, the function is subject to SQL injection problems as it is written; all you need to do is define a function with a malicious table or column name. – Laurenz Albe Nov 13 '16 at 12:20
  • Thanks, @klin, you figured out the problem without me having posted the error. Nice. One **Add-on**: It appears that `raise notice` does not print anything until the whole loop has terminated. Is there a way to make it print after every loop cycle? – Mark Heckmann Nov 13 '16 at 12:25
  • @MarkHeckmann - notices are raised immediately but the way they are received by a client depends on this client. For example, in **psql** notices are shown asynchronously. – klin Nov 13 '16 at 12:42
  • @LaurenzAlbe - sql injections from information_schema? I cannot imagine I would allow a user to use an arbitrary text for a table or column name. This is a calling for troubles. – klin Nov 13 '16 at 13:25
  • @klin Whoever runs that `DO` statement is probably a superuser, so if I knew that this statement gets run and I want to escalate my privileges, I could do so as a normal user as long as I got a schema where I can create a table. For good security you should always be paranoid. – Laurenz Albe Nov 13 '16 at 19:03
  • 1
    This is a ***loaded footgun*** for SQL injection, just like @Laurenz says. Table and column names have to be treated as unsafe user input *at all times*. It's enough to have the `TEMP` privilege to smuggle in a temp table called `"some_existing_tbl_name; DROP SCHEMA public CASCADE; --"`. But even if you are not paranoid enough, it's still a bug, because non-standard names need to be escaped anyway. – Erwin Brandstetter Nov 16 '16 at 06:08
  • 1
    Yes, I should edit the answer to follow a general rule for educational reasons. I do not see however any guns here. Your example is not justified as you cannot see other people's temporary tables in information_schema. If you start to behave as a paranoid that means you not in control of the situation. I prefer to remain rational. The subject is too wide to consider here, note only that the OP selects a specific own table. – klin Nov 16 '16 at 10:03