46

Is this possible? I'm interested in finding out which columns were specified in the UPDATE request regardless of the fact that the new value that is being sent may or may not be what is stored in the database already.

The reason I want to do this is because we have a table that can receive updates from multiple sources. Previously, we weren't recording which source the update originated from. Now the table stores which source has performed the most recent update. We can change some of the sources to send an identifier, but that isn't an option for everything. So I'd like to be able to recognize when an UPDATE request doesn't have an identifier so I can substitute in a default value.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
EvilAmarant7x
  • 2,059
  • 4
  • 24
  • 33

4 Answers4

46

If a "source" doesn't "send an identifier", the column will be unchanged. Then you cannot detect whether the current UPDATE was done by the same source as the last one or by a source that did not change the column at all. In other words: this does not work properly.

If the "source" is identifiable by any session information function, you can work with that. Like:

NEW.column = session_user;

Unconditionally for every update.

General Solution

I found a way how to solve the original problem.

Set the column to a default value if it's not targeted in an UPDATE (not in the SET list). Key element is a per-column trigger introduced with PostgreSQL 9.0 - a column-specific trigger using the UPDATE OFcolumn_name clause. The manual:

The trigger will only fire if at least one of the listed columns is mentioned as a target of the UPDATE command.

That's the only simple way I found to distinguish whether a column was updated with a new value identical to the old, versus not updated at all.

One could also parse the text returned by current_query(). But that seems cumbersome, tricky and unreliable.

Trigger functions

I assume a column source defined NOT NULL.

Step 1: Set source to NULL if unchanged:

CREATE OR REPLACE FUNCTION trg_tbl_upbef_step1()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   IF NEW.source = OLD.source THEN
      NEW.source := NULL;      -- "impossible" value (source is NOT NULL)
   END IF;

   RETURN NEW;
END
$func$;

Step 2: Revert to old value. Trigger will only be fired, if the value was actually updated (see below):

CREATE OR REPLACE FUNCTION trg_tbl_upbef_step2()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   IF NEW.source IS NULL THEN
      NEW.source := OLD.source;
   END IF;

   RETURN NEW;
END
$func$;

Step 3: Now we can identify the lacking update and set a default value instead:

CREATE OR REPLACE FUNCTION trg_tbl_upbef_step3()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   IF NEW.source IS NULL THEN
      NEW.source := 'UPDATE default source';  -- optionally same as column default
   END IF;

   RETURN NEW;
END
$func$;

Triggers

The trigger for Step 2 is fired per column!

CREATE TRIGGER upbef_step1
  BEFORE UPDATE ON tbl
  FOR EACH ROW
  EXECUTE PROCEDURE trg_tbl_upbef_step1();

CREATE TRIGGER upbef_step2
  BEFORE UPDATE OF source ON tbl             -- key element!
  FOR EACH ROW
  EXECUTE PROCEDURE trg_tbl_upbef_step2();
    
CREATE TRIGGER upbef_step3
  BEFORE UPDATE ON tbl
  FOR EACH ROW
  EXECUTE PROCEDURE trg_tbl_upbef_step3();

db<>fiddle here

Trigger names are relevant, because they are fired in alphabetical order (all being BEFORE UPDATE)!

The procedure could be simplified with something like "per-not-column triggers" or any other way to check the target-list of an UPDATE in a trigger. But I see no handle for this, currently (unchanged as of Postgres 14).

If source can be NULL, use any other "impossible" intermediate value and check for NULL additionally in trigger function 1:

IF OLD.source IS NOT DISTINCT FROM NEW.source THEN
    NEW.source := '#impossible_value#';
END IF;

Adapt the rest accordingly.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Ah, I just tried this out, you're right. Our sources are a number of different endpoints that can be used by a number of different users. So the identifier we're interested in is a combination of which endpoint the update originated from and from which user. So the update could look like 'UPDATE table SET source = source_id, user = user_id, data = new_data' or just 'UPDATE table set data = new_data'. I guess as a hack I could check current_query() for text that would indicate if those identifying columns are being set or not. – EvilAmarant7x Jan 06 '12 at 18:15
  • 1
    @Erwin: I think your answer does the job :-) Just for interest (i.e. I did no research myself): Do you think, this might be a job for the rule system? – A.H. Jan 07 '12 at 20:47
  • @A.H.: The same restrictions apply to rules: in the [rule condition, only `NEW` and `OLD` are visible](http://www.postgresql.org/docs/current/interactive/sql-createrule.html). I see no way to distinguish the case where a column is updated with the same value it had, from the case where it is not updated at all. Do you? – Erwin Brandstetter Jan 08 '12 at 02:13
  • Ah-ha! I didn't realize that per-column triggers existed! Quite clever! – EvilAmarant7x Jan 09 '12 at 13:50
  • 2
    I only regret that I have but one upvote to offer for this answer. – pcronin Aug 06 '18 at 19:08
  • I did no research myself. Probably with `pl/perl` we can check `exists $NEW{ foo }`. The `exists` perl keyword to check if key `foo` exists at object/hash `$NEW`. It will return `true` even if `$NEW{ foo } is undefined undef` (NULL) – Eugen Konkov Jan 15 '19 at 09:24
  • @ErwinBrandstetter I haven't tried your method yet, but I can see clearly that it's going to work, very clever! I have a similar problem to yours in that my application supplies a column which identifies the application (not database) user making the update. But if the table is updated by psql/pgadmin and care is not taken to include that column, my trigger will incorrectly log a row with the ID of the user who made the previous change. Your scheme prevents that. – little_birdie Jan 31 '20 at 02:01
  • @EugenKonkov Indeed I would like to try converting my functions to pg/python and see if I can do this kind of thing, such as: "for set of column names present in both table A and table B, if NEW value does not match OLD value, insert an audit row". I do this now with pg/plsql but it's quite ugly and hard to maintain because of hard coded column names. Would like to do it totally dynamically. – little_birdie Jan 31 '20 at 02:07
  • Ok I tried this, and it did not work exactly... On my column, there is a unique constraint, and even though this constraint is deferred, it will NOT allow me to set an invalid value in the first trigger, the constraint aborts the transaction. Since my column allows NULL, I was using a value of 0 (zero) which is a fk value that I do not use, as an indicator, but this did not work, and the only choice I had was to give up being able to explicitly store NULL, so, it is not possible for the client to specify column = NULL. But NULL is stored if the column is not specified. – little_birdie Feb 03 '20 at 20:12
  • @little_birdie: Unique constraints are more restrictive. See: https://stackoverflow.com/a/10035119/939860 or https://dba.stackexchange.com/a/213625/3684 – Erwin Brandstetter Feb 03 '20 at 23:33
26

Another way is to exploit JSON/JSONB functions that come in recent versions of PostgreSQL. It has the advantage of working both with anything that can be converted to a JSON object (rows or any other structured data), and you don't even need to know the record type.

To find the differences between any two rows/records, you can use this little hack:

SELECT pre.key AS columname, pre.value AS prevalue, post.value AS postvalue
FROM jsonb_each(to_jsonb(OLD)) AS pre
CROSS JOIN jsonb_each(to_jsonb(NEW)) AS post
WHERE pre.key = post.key AND pre.value IS DISTINCT FROM post.value

Where OLD and NEW are the built-in records found in trigger functions representing the pre and after state respectively of the changed record. Note that I have used the table aliases pre and post instead of old and new to avoid collision with the OLD and NEW built-in objects. Note also the use of IS DISTINCT FROM instead of a simple != or <> to handle NULL values appropriately.

Of course, this will also work with any ROW constructor such as ROW(1,2,3,...) or its short-hand (1,2,3,...). It will also work with any two JSONB objects that have the same keys.

For example, consider an example with two rows (already converted to JSONB for the purposes of the example):

SELECT pre.key AS columname, pre.value AS prevalue, post.value AS postvalue
FROM jsonb_each('{"col1": "same", "col2": "prediff", "col3": 1, "col4": false}') AS pre
CROSS JOIN jsonb_each('{"col1": "same", "col2": "postdiff", "col3": 1, "col4": true}') AS post
WHERE pre.key = post.key AND pre.value IS DISTINCT FROM post.value

The query will show the columns that have changed values:

 columname | prevalue  | postvalue
-----------+-----------+------------
 col2      | "prediff" | "postdiff"
 col4      | false     | true

The cool thing about this approach is that it is trivial to filter by column. For example, imagine you ONLY want to detect changes in columns col1 and col2:

SELECT pre.key AS columname, pre.value AS prevalue, post.value AS postvalue
FROM jsonb_each('{"col1": "same", "col2": "prediff", "col3": 1, "col4": false}') AS pre
CROSS JOIN jsonb_each('{"col1": "same", "col2": "postdiff", "col3": 1, "col4": true}') AS post
WHERE pre.key = post.key AND pre.value IS DISTINCT FROM post.value
AND pre.key IN ('col1', 'col2')

The new results will exclude col3 from the results even if it's value has changed:

 columname | prevalue  | postvalue
-----------+-----------+------------
 col2      | "prediff" | "postdiff"

It is easy to see how this approach can be extended in many ways. For example, say you want to throw an exception if certain columns are updated. You can achieve this with a universal trigger function, that is, one that can be applied to any/all tables, without having to know the table type:

CREATE OR REPLACE FUNCTION yourschema.yourtriggerfunction()
RETURNS TRIGGER AS
$$
DECLARE
    immutable_cols TEXT[] := ARRAY['createdon', 'createdby'];
BEGIN

    IF TG_OP = 'UPDATE' AND EXISTS(
        SELECT 1
        FROM jsonb_each(to_jsonb(OLD)) AS pre, jsonb_each(to_jsonb(NEW)) AS post
        WHERE pre.key = post.key AND pre.value IS DISTINCT FROM post.value
        AND pre.key = ANY(immutable_cols)
    ) THEN
        RAISE EXCEPTION 'Error 12345 updating table %.%. Cannot alter these immutable cols: %.',
            TG_TABLE_SCHEMA, TG_TABLE_NAME, immutable_cols;
    END IF;

END
$$
LANGUAGE plpgsql VOLATILE

You would then register the above trigger function to any and all tables you want to control via:

CREATE TRIGGER yourtiggername
BEFORE UPDATE ON yourschema.yourtable
FOR EACH ROW EXECUTE PROCEDURE yourschema.yourtriggerfunction();
Demian Martinez
  • 471
  • 5
  • 5
  • Fantastic. I've seen something similar on SQLite too, using the JSON1 extension, although not as convenient, because there's no equivalent to `to_jsonb(OLD)` if I recall correctly. – ddevienne Nov 04 '20 at 17:24
2

In plpgsql you could do something like this in your trigger function:

IF NEW.column IS NULL THEN
  NEW.column = 'default value';
END IF;
Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
  • So simple! Naturally, I could find this answer in the documentation after I read the response... – EvilAmarant7x Jan 06 '12 at 15:22
  • 3
    Will this really delete a value once it is set? I.e. if the the table row contains some value for `column` then your code will only work if the statement was like `UPDATE table SET column = null, data_col=...` But this would require adoption of code the questioner has no access to. – A.H. Jan 06 '12 at 17:13
  • It doesn't check the OLD data, but that is possible if needed. Just add some code. – Frank Heikens Jan 06 '12 at 17:46
  • 2
    @FrankHeikens: How would you distinguish between an update by the same source as the last update and an update that does not change the column at all? – Erwin Brandstetter Jan 06 '12 at 17:48
  • @A.H: you may be interested in the solution I found. – Erwin Brandstetter Jan 06 '12 at 19:45
  • @EvilAmarant7x: May you post link to the doc you have found? – Eugen Konkov Jan 15 '19 at 09:15
0

I have obtained another solution to similar problem almost naturally, because my table contained a column with semantics of 'last update timestamp' (lets call it UPDT).

So, I decided to include new values of source and UPDT in any update only at once (or none of them). Since UPDT is intended to change on every update, with such a policy one can use condition new.UPDT = old.UPDT to deduce that no source was specified with current update and substitute the default one.

If one already has 'last update timestamp' column in his table, this solution will be simpler, than creating three triggers. Not sure if it is better idea to create UPDT, when it is not needed already. If updates are so frequent that there is risk of timestamp similarity, a sequencer can be used instead of timestamp.

mas.morozov
  • 2,666
  • 1
  • 22
  • 22