77

I posted a related question, but this is another part of my puzzle.

I would like to get the OLD value of a column from a row that was UPDATEd - WITHOUT using triggers (nor stored procedures, nor any other extra, non -SQL/-query entities).

I have a query like this:

   UPDATE my_table
      SET processing_by = our_id_info  -- unique to this worker
    WHERE trans_nbr IN (
                        SELECT trans_nbr
                          FROM my_table
                         GROUP BY trans_nbr
                        HAVING COUNT(trans_nbr) > 1
                         LIMIT our_limit_to_have_single_process_grab
                       )
RETURNING row_id;

If I could do FOR UPDATE ON my_table at the end of the subquery, that'd be divine (and fix my other question/problem). But that won't work: can't combine this with GROUP BY (which is necessary for figuring out the count). Then I could just take those trans_nbr's and do a query first to get the (soon-to-be-) former processing_by values.

I've tried doing like:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
     FROM my_table old_my_table
     JOIN (
             SELECT trans_nbr
               FROM my_table
           GROUP BY trans_nbr
             HAVING COUNT(trans_nbr) > 1
              LIMIT our_limit_to_have_single_process_grab
          ) sub_my_table
       ON old_my_table.trans_nbr = sub_my_table.trans_nbr
    WHERE     my_table.trans_nbr = sub_my_table.trans_nbr
      AND my_table.processing_by = old_my_table.processing_by
RETURNING my_table.row_id, my_table.processing_by, old_my_table.processing_by

But that can't work; old_my_table is not visible outside the join; the RETURNING clause is blind to it.

I've long since lost count of all the attempts I've made; I have been researching this for literally hours.

If I could just find a bullet-proof way to lock the rows in my subquery - and ONLY those rows, and WHEN the subquery happens - all the concurrency issues I'm trying to avoid would disappear ...


UPDATE: I had a typo in the non-generic code of the above. I retried after Erwin Brandstetter suggested it should work. Since it took me so long to find this sort of solution, perhaps my embarrassment is worth it? At least this is on SO for posterity now... :>

What I now have (that works) is like this:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
     FROM my_table AS old_my_table
    WHERE trans_nbr IN (
                          SELECT trans_nbr
                            FROM my_table
                        GROUP BY trans_nbr
                          HAVING COUNT(*) > 1
                           LIMIT our_limit_to_have_single_process_grab
                       )
      AND my_table.row_id = old_my_table.row_id
RETURNING my_table.row_id, my_table.processing_by, old_my_table.processing_by AS old_processing_by

The COUNT(*) is per a suggestion from Flimzy in a comment on my other (linked above) question.

Please see my other question for correctly implementing concurrency and even a non-blocking version; THIS query merely shows how to get the old and new values from an update, ignore the bad/wrong concurrency bits.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
pythonlarry
  • 2,316
  • 2
  • 20
  • 17
  • Why can't you use a rule or trigger? – Jonathan Hall Oct 28 '11 at 07:29
  • 1
    Best way IMHO is to make the old rows historic, either by explicit SQL, or by a rewrite rule or trigger. – wildplasser Oct 28 '11 at 08:37
  • @Flimzy: 1. If one didn't have access to such things (though *I* do), if it CAN be done purely in SQL/single query... 2. Rules/triggers are a whole 'nother debug enchilada. 3. Keeping it simple, straight SQL, and having One Query To Do It All does K.I.S.S. nicely. Thanks, again, for the count(*) reminder, tho! :> – pythonlarry Mar 28 '13 at 20:40
  • @wildplasser: The goal is to get back (1) what was changed and (2) what was there before the change (at least). This is handy for jobs where what will be changed isn't explicitly known up front, but the program should know what the old vals were for processing. (Outputting before/after for troubleshooting, for example.) Historic rows are unnecessary clutter, rules and triggers are not only cruft (for this use case), but also require "more" (security, access, etc.). For those who don't have/want/need these, this solution is the best. – pythonlarry Mar 28 '13 at 20:43

4 Answers4

116

Problem

The manual explains:

The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.

Bold emphasis mine. There is no way to access the old row in a RETURNING clause. You can work around this restriction with a trigger or a separate SELECT before the UPDATE wrapped in a transaction or wrapped in a CTE as was commented.

However, what you are trying to achieve works perfectly fine if you join to another instance of the table in the FROM clause:

Solution without concurrent writes

UPDATE tbl x
SET    tbl_id = 23
     , name = 'New Guy'
FROM   tbl y                -- using the FROM clause
WHERE  x.tbl_id = y.tbl_id  -- must be UNIQUE NOT NULL
AND    x.tbl_id = 3
RETURNING y.tbl_id AS old_id, y.name AS old_name
        , x.tbl_id          , x.name;

Returns:

 old_id | old_name | tbl_id |  name
--------+----------+--------+---------
  3     | Old Guy  | 23     | New Guy

The column(s) used to self-join must be UNIQUE NOT NULL. In the simple example, the WHERE condition is on the same column tbl_id, but that's just coincidence. Works for any conditions.

I tested this with PostgreSQL versions from 8.4 to 13.

It's different for INSERT:

Solutions with concurrent write load

There are various ways to avoid race conditions with concurrent write operations on the same rows. (Note that concurrent write operations on unrelated rows are no problem at all.) The simple, slow and sure (but expensive) method is to run the transaction with SERIALIZABLE isolation level:

BEGIN ISOLATION LEVEL SERIALIZABLE;
UPDATE ... ;
COMMIT;

But that's probably overkill. And you need to be prepared to repeat the operation in case of a serialization failure.

Simpler and faster (and just as reliable with concurrent write load) is an explicit lock on the one row to be updated:

UPDATE tbl x
SET    tbl_id = 24
     , name = 'New Gal'
FROM  (SELECT tbl_id, name FROM tbl WHERE tbl_id = 4 FOR UPDATE) y 
WHERE  x.tbl_id = y.tbl_id
RETURNING y.tbl_id AS old_id, y.name AS old_name
        , x.tbl_id          , x.name;

Note how the WHERE condition moved to the subquery (again, can be anything), and only the self-join (on UNIQUE NOT NULL column(s)) remains in the outer query. This guarantees that only rows locked by the inner SELECT are processed. The WHERE conditions might resolve to a different set of rows a moment later.

See:

db<>fiddle here
Old sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • @a_horse_with_no_name: Do you happen to have a 9.1 cluster at hand where you could run this little test in? To make things complete .. :) – Erwin Brandstetter Oct 28 '11 at 11:30
  • 3
    Works on 9.1 just as well (I would have been **very** surprised if it didn't) –  Oct 28 '11 at 11:36
  • @a_horse_with_no_name: Thank you! Same here, but knowing >> assuming. :) – Erwin Brandstetter Oct 28 '11 at 12:10
  • 5
    Not to pick nits, but I might suggest editing your answer, @ErwinBrandstetter, as a passer-by might stop reading the top portion ("_it can't be done_") and miss your **totally rockin' proof that it can**! Regardless, thanks, again! :o) – pythonlarry Oct 28 '11 at 13:25
  • @PythonLarry: this *edit* should help the pessimists among us. :) – Erwin Brandstetter Oct 28 '11 at 14:04
  • Cheers @Erwin Brandstetter. Next step would be to create a recursive version of this beauty ;-) I fear it's not allowed to have side effects in the recursive CTE, though. – wildplasser Oct 28 '11 at 15:50
  • 4
    It works well until it doesn't. For me it happens with running the same request with different parameters with reasonably small delay (about 10 milliseconds). In this case following requests will have old value from one of followed requests (read as 'object having different key'), not the running one. *So use that with caution in cases with heavy load!* – JLarky Aug 21 '12 at 16:03
  • @ErwinBrandstetter what about using a subquery as I suggest? http://stackoverflow.com/a/22734004/242933 – ma11hew28 Mar 29 '14 at 20:26
  • @ErwinBrandstetter Why do you need `WHERE x.tbl_id = y.tbl_id`? – ma11hew28 Dec 30 '15 at 15:53
  • 1
    @MattDiPasquale: We are joining to a second instance of the same table under a different alias. We need a join condition or it's going to be a cross join (Cartesian product). – Erwin Brandstetter Dec 30 '15 at 16:48
  • 2
    If the target table does not have `UNIQUE NOT NULL` column, we can join using the `ctid` system column; `WHERE x.ctid = y.ctid`. – Mohammad Alhashash Sep 16 '18 at 06:12
  • 4
    Every time I have a PG question all I need is to look for @ErwinBrandstetter responses. This guy is The Matrix's equivalent of Neo to PostgreSQL. – Stefan Apr 24 '20 at 03:43
29

You can use a SELECT subquery.

Example: Update a user's email RETURNING the old value.

  1. RETURNING Subquery

    UPDATE users SET email = 'new@gmail.com' WHERE id = 1
    RETURNING (SELECT email FROM users WHERE id = 1);
    
  2. PostgreSQL WITH Query (Common Table Expressions)

    WITH u AS (
        SELECT email FROM users WHERE id = 1
    )
    UPDATE users SET email = 'new@gmail.com' WHERE id = 1
    RETURNING (SELECT email FROM u);
    

    This has worked several times on my local database without fail, but I'm not sure if the SELECT in WITH is guaranteed to consistently execute before the UPDATE since "the sub-statements in WITH are executed concurrently with each other and with the main query."

ma11hew28
  • 121,420
  • 116
  • 450
  • 651
  • 2
    Do both solutions work consistently under heavy load? If yes, prove it. If not, how can they be modified so that they do? – ma11hew28 Mar 29 '14 at 20:25
  • 2
    Sounds like that should rather be a question, then. You can always link to this one for context ... – Erwin Brandstetter Mar 29 '14 at 20:31
  • 3
    @2: "All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query." -- https://www.postgresql.org/docs/9.3/static/queries-with.html – moi Apr 08 '18 at 18:57
  • Something to note here is this only works if you're updating a single row. If the WHERE clause selects multiple rows, you get an error: "more than one row returned by a subquery used as an expression". – philh Apr 20 '22 at 08:55
14

The CTE variant proposed here works too.
With the comfortable means of a CTE I would be more explicit, though:

WITH sel AS (
   SELECT tbl_id, name
   FROM   tbl
   WHERE  tbl_id = 3  -- assuming unique tbl_id
   )
, upd AS (
   UPDATE tbl
   SET    name = 'New Guy'
   WHERE  tbl_id = 3
   RETURNING tbl_id, name
   )
SELECT s.tbl_id AS old_id, s.name As old_name
     , u.tbl_id, u.name
FROM   sel s, upd u;

SELECT and UPDATE see the same snapshot of the database. The SELECT is bound to return the old values (even if you place the CTE after the CTE with the UPDATE), while the UPDATE returns the new values by definition. Voilá.

But it will be slower than my first answer.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Do you see any immediate issues using the subquery method? (performance or otherwise) – calebboyd Oct 25 '16 at 04:45
  • @calebboyd: The only immediate "issue": it's faster than the CTE variant. Also consider the version defending against possible concurrency issues discussed in the other answer. – Erwin Brandstetter Oct 25 '16 at 11:54
  • I guess that might be what I was wondering -- Do you know if the subquery would exhibit the similar concurrency issues? -- may be a separate question... – calebboyd Oct 26 '16 at 22:18
  • 1
    In READ COMMITTED, when the update statement is blocked by another transaction, the select statement will not be rerun, after the other transaction commits. So it will return the original value (existing before both transactions) not the one which is actually overwritten by the update statement. I guess it depends on the use case if that matters or not. – Joe23 Jun 28 '23 at 13:36
  • 1
    @Joe23: Yes, that's an important detail to be aware of. I am addressing that in my main answer with a `FOR UPDATE` lock to avoid the race condition. – Erwin Brandstetter Jun 30 '23 at 00:00
2

when faced with this dilemma I added junk columns to the table and then I copy the old values into the junk columns (which I then return) when I update the record. this bloats the table a bit but avoids the need for joins.

Jasen
  • 11,837
  • 2
  • 30
  • 48