9

Using Postgres, I can perform an update statement and return the rows affected by the commend.

UPDATE accounts
SET status = merge_accounts.status,
    field1 = merge_accounts.field1,
    field2 = merge_accounts.field2,
    etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
RETURNING accounts.*

This will give me a list of all records that matched the WHERE clause, however will not tell me which rows were actually updated by the operation.

In this simplified use-case it of course would be trivial to simply add another guard AND status != 'Closed, however my real world use-case involves updating potentially dozens of fields from a merge table with 10,000+ rows, and I want to be able to detect which rows were actually changed, and which are identical to their previous version. (The expectation is very few rows will actually have changed).

The best I've got so far is

UPDATE accounts
SET x=..., y=...
FROM accounts as old WHERE old.uid = accounts.uid
FROM merge_accounts WHERE merge_accounts.uid = accounts.uid
RETURNING accounts, old

Which will return a tuple of old and new rows that can then be diff'ed inside my Java codebase itself - however this requires significant additional network traffic and is potentially error prone.

The ideal scenario is to be able to have postgres return just the rows that actually had any values changed - is this possible?

Here on github is a more real world example of what I'm doing, incorporating some of the suggestions so far.
Using Postgres 9.1, but can use 9.4 if required. The requirements are effectively

  • Be able to perform an upsert of new data
  • Where we may only know the specific key/value pair to update on any given row
  • Get back a result containing just the rows that were actually changed by the upsert
  • Bonus - get a copy of the old records as well.

Since this question was opened I've gotten most of this working now, although I'm unsure if my approach is a good idea or not - it's a bit hacked together.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
James Davies
  • 9,602
  • 5
  • 38
  • 42
  • Is the *row type* of `accounts` and `merge_accounts` identical? Can columns be NULL? As *always*, this would be much easier if you provided your version of Postgres and table definitions (`\d tbl` in psql or the complete `CREATE TABLE` script). – Erwin Brandstetter Jan 15 '15 at 06:38
  • `merge_accounts` is identical, save that all non-pk columns are array types, which I treat as if it was an `Option` type to differentiate between 'nothing to merge' and 'null'. The actual schema is pretty large, I'll extrapolate something more specific and throw it in a gist. Version wise - I'm happy to run whichever version is required to make this work. Currently 9.1, but planning on upgrading to 9.4 regardless. – James Davies Jan 15 '15 at 06:41
  • This is where I'm at so far, incorporating the suggestions from the first answer.https://gist.github.com/jimmydivvy/c425615646812872d300 – James Davies Jan 15 '15 at 06:57

5 Answers5

14

Only update rows that actually change

That saves expensive updates and expensive checks after the UPDATE.

To update every column with the new value provided (if anything changes):

UPDATE accounts a
SET   (status,   field1,   field2)  -- short syntax for  ..
  = (m.status, m.field1, m.field2)  -- .. updating multiple columns
FROM   merge_accounts m
WHERE  m.uid = a.uid
AND   (a.status IS DISTINCT FROM m.status OR
       a.field1 IS DISTINCT FROM m.field1 OR 
       a.field2 IS DISTINCT FROM m.field2)
RETURNING a.*;

Due to PostgreSQL's MVCC model any change to a row writes a new row version. Updating a single column is almost as expensive as updating every column in the row at once. Rewriting the rest of the row comes at practically no cost, as soon as you have to update anything.

Details:

Shorthand for whole rows

If the row types of accounts and merge_accounts are identical and you want to adopt everything from merge_accounts into accounts, there is a shortcut comparing the whole row type:

UPDATE accounts a
SET   (status,   field1,   field2)
  = (m.status, m.field1, m.field2)
FROM   merge_accounts m
WHERE  a.uid = m.uid
AND    m IS DISTINCT FROM a
RETURNING a.*;

This even works for NULL values. Details in the manual.
But it's not going to work for your home-grown solution where (quoting your comment):

merge_accounts is identical, save that all non-pk columns are array types

It requires compatible row types, i.e. each column shares the same data type or there is at least an implicit cast between the two types.

For your special case

UPDATE accounts a
SET   (status, field1, field2)
    = (COALESCE(m.status[1], a.status)  -- default to original ..
     , COALESCE(m.field1[1], a.field1)   -- .. if m.column[1] IS NULL
     , COALESCE(m.field2[1], a.field2))
FROM   merge_accounts m
WHERE  m.uid = a.uid
AND   (m.status[1] IS NOT NULL AND a.status IS DISTINCT FROM m.status[1]
    OR m.field1[1] IS NOT NULL AND a.field1 IS DISTINCT FROM m.field1[1]
    OR m.field2[1] IS NOT NULL AND a.field2 IS DISTINCT FROM m.field2[1])
RETURNING a.*

m.status IS NOT NULL works if columns that shouldn't be updated are NULL in merge_accounts.
m.status <> '{}' if you operate with empty arrays.
m.status[1] IS NOT NULL covers both options.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks. Is using 'distinct' better than '!=' as an operator, or are they essentially the same? I'm not attached to the array concept - it was simply my first approach at being able to differentiate between a field that is null, and a field that doesn't need merging. – James Davies Jan 15 '15 at 07:16
  • 1
    @JamesDavies: It's different concerning NULL handling and it is essential that you understand the difference before you proceed. [Study the manual here.](http://www.postgresql.org/docs/current/interactive/functions-comparison.html) Aside: the standard SQL inequality operaor is `<>`. Postgres also accepts `!=`, but it's better to use the standard. – Erwin Brandstetter Jan 15 '15 at 07:20
  • Related answer by the same author: https://dba.stackexchange.com/a/118214/154985 – nh2 Feb 20 '22 at 17:37
3

if you aren't relying on side-effectts of the update, only update the records that need to change

UPDATE accounts
SET status = merge_accounts.status,
    field1 = merge_accounts.field1,
    field2 = merge_accounts.field2,
    etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
 AND NOT (status IS NOT DISTINCT FROM merge_accounts.status 
      AND field1 IS NOT DISTINCT FROM merge_accounts.field1 
      AND field2 IS NOT DISTINCT FROM merge_accounts.field2
      )
RETURNING accounts.* 
Jasen
  • 11,837
  • 2
  • 30
  • 48
  • Guess this is a better solution. Why change/write data blocks unless it is really necessary? – Jayadevan Jan 15 '15 at 06:19
  • Isn't `CASE x WHEN true THEN false ELSE true` equivalent to `NOT (x)`? – Nick Barnes Jan 15 '15 at 06:38
  • 1
    By the way, if this needs to handle nulls, you'd probably want `status IS NOT DISTINCT FROM merge_accounts.status`, etc. in the condition. – Nick Barnes Jan 15 '15 at 06:41
  • @NickBarnes when x is null it's different, also this form allows me to use similar syntax to the update clause. – Jasen Jan 15 '15 at 06:50
  • I don't think you can get around it this easily... `NULL=1` is `NULL` and `NULL=NULL` is `NULL`, but we need to distinguish between these cases. – Nick Barnes Jan 15 '15 at 07:07
  • you've got a point there I need IS NOT DISTINCT FROM in the where clause, and then I might as well use not, – Jasen Jan 15 '15 at 07:21
0

I would recommend using the information_schema.columns table to introspect the columns dynamically, and then use those within a plpgsql function to dynamically generate the UPDATE statement.

i.e. this DDL:

create table foo
(
  id serial,
  val integer,
  name text
);

insert into foo (val, name) VALUES (10, 'foo'), (20, 'bar'), (30, 'baz');

And this query:

select column_name
from information_schema.columns
where table_name = 'foo'
order by ordinal_position;

will yield the columns for the table in the order that they were defined in the table DDL.

Essentially you would use the above SELECT within the function to dynamically build up your UPDATE statement by iterating over the results of the above SELECT in a FOR LOOP to dynamically build up both the SET and WHERE clauses.

khampson
  • 14,700
  • 4
  • 41
  • 43
0

Some variation of this ?

SELECT * FROM old;
 id | val 
----+-----
  1 |   1
  2 |   2
  4 |   5
  5 |   1
  6 |   2

SELECT * FROM new;
 id | val 
----+-----
  1 |   2
  2 |   2
  3 |   2
  5 |   1
  6 |   1

SELECT * FROM old JOIN new ON old.id = new.id;
 id | val | id | val 
----+-----+----+-----
  1 |   1 |  1 |   2
  2 |   2 |  2 |   2
  5 |   1 |  5 |   1
  6 |   2 |  6 |   1
(4 rows)

 WITH sel AS (
               SELECT o.id , o.val  FROM old o JOIN new n ON o.id=n.id  ),
 upd AS (
              UPDATE old  SET val = new.val  FROM new WHERE new.id=old.id RETURNING old.*  )
 SELECT * from sel, upd WHERE sel.id = upd.id AND sel.val <> upd.val;
 id | val | id | val 
----+-----+----+-----
  1 |   1 |  1 |   2
  6 |   2 |  6 |   1
(2 rows)

Refer SO answer and read the entire discussion.

Community
  • 1
  • 1
Jayadevan
  • 1,306
  • 2
  • 12
  • 33
0

If you are updating a single table and want to know if the row is actually changed you can use this query:

with rows_affected as (
    update mytable set (field1, field2, field3)=('value1', 'value2', 3) where id=1 returning *
)
select count(*)>0 as is_modified from rows_affected 
join mytable on mytable.id=rows_affected.id
where rows_affected is distinct from mytable;

And you can wrap your existing queries into this one without the need to modify the actual update statements.

xc218m
  • 69
  • 6