1

I've a feed with 600k rows of property listings that I'd like to upsert daily into a table. But I'd also like to insert the old columns (or the whole row) into a different table in order to track the changes in the data. I need to know if, for example, how some property price evolved in the previous year.

The problem with the upsert is that is going to need all the fields to be specified, because I'd like to track the changes for every column so the "ON CONFLICT" clause would get too complicated:

INSERT INTO prop_listings (id, prop_type, price, rooms)
VALUES
  (33, 'house', 60000, 4)
, (22, 'apartment', 30000, 2)
ON CONFLICT (id, prop_type)  -- complicated clause
DO UPDATE SET
      prop_type = EXCLUDED.price_usd
    , price = EXCLUDED.volume_24h
      ...;

Another issue would be to track the old values (the columns being updated) in another table. I've read that this could be done with a TRIGGER.

So I'm thinking if that's the best way to do this or if there's anything else I'm missing.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Rod0n
  • 1,019
  • 2
  • 14
  • 33

1 Answers1

3

Don't list all columns in the ON CONFLICT clause, just the id (or whatever constitutes your primary key). So it's simpler:

INSERT INTO prop_listings AS p (id, prop_type, price, rooms)
VALUES
  (33, 'house', 60000, 4)
, (22, 'apartment', 30000, 2)
ON CONFLICT (id)  -- simple clause
DO UPDATE
SET    prop_type = EXCLUDED.price_usd
     , price     = EXCLUDED.volume_24h
     , rooms     = EXCLUDED.rooms
WHERE (p.prop_type, p.price, p.rooms) IS DISTINCT FROM
      (EXCLUDED.prop_type, EXCLUDED.price, EXCLUDED.rooms);

I added a WHERE clause to skip empty updates that wouldn't change anything - also skipping below trigger. This can make a huge difference if most of your new row versions are unchanged. See:

To keep track of old row versions, you could create a backup table with identical structure (let's call it prop_backup) and add a trigger like:

CREATE OR REPLACE FUNCTION trg_prop_backup()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   INSERT INTO prop_backup  -- table has same columns!
   VALUES (OLD.*);
   
   RETURN NEW;
END
$func$;

CREATE TRIGGER prop_listings_upd_bef
BEFORE UPDATE ON prop_listings
FOR EACH ROW EXECUTE FUNCTION trg_prop_backup();

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • The trigger is going to insert only the values on conflict? – Rod0n Oct 24 '21 at 00:57
  • I forgot to consider the rows that doesn't appear anymore. Is there a simple change to take care of that or is another different problem? – Rod0n Oct 24 '21 at 01:03
  • 1
    The trigger makes a copy of every row before being updated. Should give a complete picture. If you want to delete rows that "don't appear anymore, either make the trigger `ON UPDATE OR DELETE` or manually insert them into the backup table after deleting. – Erwin Brandstetter Oct 24 '21 at 01:08
  • It's possible for the backup table to only contain the values on conflict? Otherwise I'll be inserting 600k rows per day on the backup table. – Rod0n Oct 24 '21 at 13:57
  • 1
    @Rod0n: That's what the solution currently does. The trigger fires `ON UPDATE`, i.e. only when the `INSERT` conflicts, and the `UPDATE` actually changes the row. To extend the trigger for the `DELETE` case like menationed above, you need to conditionally return `OLD`. See: https://stackoverflow.com/a/23744320/939860 – Erwin Brandstetter Oct 24 '21 at 16:49