2

I've got an UPSERT operation like this:

INSERT INTO people (SELECT * FROM people_update)
  ON CONFLICT (name,surname)
    DO UPDATE SET age = EXCLUDED.age , street = EXCLUDED.street , city = EXCLUDED.city , postal = EXCLUDED.postal
    WHERE 
      (people.age,people.street,people.city,people.postal) IS DISTINCT FROM (EXCLUDED.age,EXCLUDED.street,EXCLUDED.city,EXCLUDED.postal)
RETURNING case when xmax::text::int > 0 then 'updated' else 'inserted' end,name,surname,age,street,city,postal;

(name,surname) is a composite primary key and the people_update table contains additional and changed rows in comparison to the people table.

My question is: Is there a way to get the inserted and updated rows of this query as the returning rows?

I halfway solved the problem by adding a RETURNING clause but I would also like to get the old values within my return values.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Having you used a `RETURNING` clause? It should do that. – Gordon Linoff Feb 28 '19 at 12:27
  • I've just read about the `RETURNING` clause but I would like to know, which rows were inserted and which were updated. I'm not really sure how to get that. – Thomas Dissert Feb 28 '19 at 12:31
  • Inserted and updated rows will [return different `xmax` values](https://stackoverflow.com/questions/39058213/postgresql-upsert-differentiate-inserted-and-updated-rows-using-system-columns-x), though it's undocumented behaviour, and so may not be reliable in the long term – Nick Barnes Feb 28 '19 at 14:57

1 Answers1

3

If you add a boolean updated column to the people table:

ALTER TABLE people ADD COLUMN updated bool DEFAULT FALSE;

then you could identify updated rows by setting updated = TRUE in the DO UPDATE SET clause:

INSERT INTO people (SELECT * FROM people_update)
  ON CONFLICT (name,surname)
    DO UPDATE SET age = EXCLUDED.age , street = EXCLUDED.street , city = EXCLUDED.city 
        , postal = EXCLUDED.postal
        , updated = TRUE
    WHERE 
      (people.age,people.street,people.city,people.postal) IS DISTINCT FROM 
      (EXCLUDED.age,EXCLUDED.street,EXCLUDED.city,EXCLUDED.postal)
RETURNING *;

For example,

CREATE TABLE test.people (
    name text
    , surname text
    , age float
    , street text
    , city text
    , postal int
);
CREATE UNIQUE INDEX people_idx on people (name, surname);
ALTER TABLE people ADD COLUMN updated bool;
ALTER TABLE people ADD COLUMN prior_age float;
ALTER TABLE people ADD COLUMN prior_street text;
ALTER TABLE people ADD COLUMN prior_city text;
ALTER TABLE people ADD COLUMN prior_postal int;

INSERT INTO people (name, surname, age, street, city, postal) VALUES 
('Sancho', 'Panza', 414, '1 Manchego', 'Barcelona', 01605)
, ('Oliver', 'Twist', 182, '2 Stilton', 'London', 01837)
, ('Quasi', 'Modo', 188, $$3 Rue d'Arcole$$, 'Paris' , 01831 )
;

CREATE TABLE test.people_update (
    name text
    , surname text
    , age float
    , street text
    , city text
    , postal int
);

INSERT INTO people_update (name, surname, age, street, city, postal) VALUES 
('Sancho', 'Panza', 4140, '10 Idiazabal', 'Montserrat', 16050)
, ('Quasi', 'Modo', 1880, $$30 Champs Elysée$$ , 'Paris', 18310 )
, ('Pinocchio', 'Geppetto', 1380, '40 Nerbone', 'Florence', 18810)
;

INSERT INTO people (SELECT * FROM people_update)
  ON CONFLICT (name,surname)
    DO UPDATE SET 
        updated = TRUE
        , prior_age = (CASE WHEN people.age = EXCLUDED.age THEN NULL ELSE people.age END)
        , prior_street = (CASE WHEN people.street = EXCLUDED.street THEN NULL ELSE people.street END)
        , prior_city = (CASE WHEN people.city = EXCLUDED.city THEN NULL ELSE people.city END)
        , prior_postal = (CASE WHEN people.postal = EXCLUDED.postal THEN NULL ELSE people.postal END)
        , age = EXCLUDED.age 
        , street = EXCLUDED.street 
        , city = EXCLUDED.city 
        , postal = EXCLUDED.postal
    WHERE 
      (people.age,people.street,people.city,people.postal) IS DISTINCT FROM 
      (EXCLUDED.age,EXCLUDED.street,EXCLUDED.city,EXCLUDED.postal)
RETURNING *;

yields

| name       | surname  |  age | street           | city       | postal | updated | prior_age | prior_street   | prior_city | prior_postal |
|------------+----------+------+------------------+------------+--------+---------+-----------+----------------+------------+--------------|
| Sancho     | Panza    | 4140 | 10 Idiazabal     | Montserrat |  16050 | t       |       414 | 1 Manchego     | Barcelona  |         1605 |
| Quasi      | Modo     | 1880 | 30 Champs Elysée | Paris      |  18310 | t       |       188 | 3 Rue d'Arcole |            |         1831 |
| Pinocchio  | Geppetto | 1380 | 40 Nerbone       | Florence   |  18810 | f       |           |                |            |              |

The updated column shows the ('Sancho', 'Panza') and ('Quasi', 'Modo') lines have been updated, and ('Pinocchio', 'Geppetto') is a new insert.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • That's about the result I got myself so far, but I am struggling with my added requirement being I would also like to get the old values within my return values. – Thomas Dissert Feb 28 '19 at 15:37
  • I'm not aware of any built in mechanism for obtaining this information other than adding more columns to record the prior values. I've update the post to show what I mean. – unutbu Feb 28 '19 at 16:04
  • You can add an Update trigger to capture and log OLD and NEW values. But there is no built-in capability for specific SQL. If you stick with the Boolean column option make sure your load script sets them all to false before loading. – Belayer Dec 21 '19 at 18:33