0

Suppose I have two tables in my Postgres database:

create table transactions
(
    id bigint primary key,
    doc_id bigint not null,
    -- lots of other columns...
    amount numeric not null
);

-- same columns
create temporary table updated_transactions
(
    id bigint primary key,
    doc_id bigint not null,
    -- lots of other columns...
    amount numeric not null
);

Both tables have just a primary key, and no unique indexes.

I need to upsert rows from updated_transactions into transactions using the following rules:

  • id column values in transactions and updated_transactions don't match
  • other columns like doc_id, etc (except of the amount) should match
  • when a matching row is found, update both amount and id columns
  • when a matching row is not found, insert it

id values in updated_transactions are taken from a sequence. A business object just populates updated_transactions and then merges the new or updated rows from it into transactions using an upsert query. So my old unchanged transactions keep their ids intact, and the updated ones are assigned new ids.

In MSSQL and Oracle, it would be a merge statement similar to this:

merge into transactions t
using updated_transactions ut on t.doc_id = ut.doc_id, ...
when matched then
    update set t.id = ut.id, t.amount = ut.amount
when not matched then
    insert (t.id, t.doc_id, ..., t.amount)
    values (ut.id, ut.doc_id, ..., ut.amount);

In PostgreSQL, I suppose it should be something like this:

insert into transactions(id, doc_id, ..., amount)
select coalesce(t.id, ut.id), ut.doc_id, ... ut.amount
from updated_transactions ut
left join transactions t on t.doc_id = ut.doc_id, ....
    on conflict
    on constraint transactions_pkey
    do update
    set amount = excluded.amount, id = excluded.id

The problem is with the do update clause: excluded.id is an old value from transactions table, while I need a new value from updated_transactions.

ut.id value is inaccessible for the do update clause, and the only thing I can use is the excluded row. But the excluded row has only coalesce(t.id, ut.id) expression which returns old id values for the existing rows.

Is it possible to update both id and amount columns using the upsert query?

yallie
  • 2,200
  • 1
  • 28
  • 26
  • `other columns like doc_id, etc (except of the amount) should match` sounds like a candidate key to me. – wildplasser Mar 29 '17 at 21:03
  • The values of doc_id and other columns, except of the amount, are not unique. I simplified the setup in my question to make my sample queries a bit more understandable. In my real case, I have to add a `row_number() over (partition by doc_id, ... order by id)` to match the rows. – yallie Mar 29 '17 at 21:17
  • In that case you cannot perform the update without falling back to `id`. – wildplasser Mar 29 '17 at 21:19
  • I still hope there is a way to do it. Actually, I see an ugly workaround by adding an unused `new_id` column to `transactions` and using `excluded.new_id` to have that `id` from `updated_transactions`. But I believe it can be avoided. – yallie Mar 29 '17 at 21:24
  • `when a matching row is found, update both amount and id columns` What if **more than one** record matches? (you have no unique match, given you have no other unique key tha `id`) – wildplasser Mar 29 '17 at 21:32
  • The rows are matched by column values **and** their row number (as ordered by id). If I have two rows with values `doc_id = 1, unit_id = 2, etc...`, the first row gets number 1, and the second gets number 2, and so on. If the row #1 is matched, it's updated. If the row #2 isn't matched, it's inserted. – yallie Mar 29 '17 at 21:42
  • I do hope you don't work for my bank. – wildplasser Mar 29 '17 at 21:48
  • Appreciate your help. – yallie Mar 29 '17 at 21:51

2 Answers2

2

Create unique index on those columns you use as key and pass its name in your upsert expression, so that it uses it instead of pkey. Then it will insert row if no matches were found, using ID from updated_transactions. If it finds match, then you can use excluded.id to get ID from updated_transactions.

I think that left join transactions is redundant.

So it would look kinda like this:

insert into transactions(id, doc_id, ..., amount)
select ut.id, ut.doc_id, ... ut.amount
from updated_transactions ut
    on conflict
    on constraint transactions_multi_column_unique_index
    do update
    set amount = excluded.amount, id = excluded.id
Łukasz Kamiński
  • 5,630
  • 1
  • 19
  • 32
  • Thanks Łukasz for your help. You are right, that's basically the same what wildplasser suggested in his comments. Unfortunately, I cannot create a unique index on these columns, perhaps I should have stated it explicitly in my question. – yallie Mar 30 '17 at 15:00
2

Looks like the task can be accomplished using writable CTEs instead of the plain upsert.

First, I'll post the easier version of the query that answers the original question as it was asked. This solution assumes that doc_id, unit_id columns address a candidate key, but doesn't require a unique index on these columns.

Test data:

create temp table transactions
(
    id bigint primary key,
    doc_id bigint,
    unit_id bigint,
    amount numeric
);

create temp table updated_transactions
(
    id bigint primary key,
    doc_id bigint,
    unit_id bigint,
    amount numeric
); 

insert into transactions(id, doc_id, unit_id, amount)
values (1, 1, 1, 10), (2, 1, 2, 15), (3, 1, 3, 10);

insert into updated_transactions(id, doc_id, unit_id, amount)
values (6, 1, 1, 11), (7, 1, 2, 15), (8, 1, 4, 20); 

The query to merge updated_transactions into transactions:

with new_values as
(
    select ut.id new_id, t.id old_id, ut.doc_id, ut.unit_id, ut.amount 
    from updated_transactions ut
    left join transactions t 
        on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id
),
updated as
(
    update transactions tr
    set id = nv.new_id, amount = nv.amount
    from new_values nv
    where id = nv.old_id
    returning tr.*
)
insert into transactions(id, doc_id, unit_id, amount)
select ut.new_id, ut.doc_id, ut.unit_id, ut.amount
from new_values ut
where ut.new_id not in (select id from updated);

The results:

select * from transactions

-- id | doc_id | unit_id | amount
------+--------+---------+-------
--  3 |   1    |    3    |  10    -- not changed
--  6 |   1    |    1    |  11    -- updated
--  7 |   1    |    2    |  15    -- updated 
--  8 |   1    |    4    |  20    -- inserted

In my real application doc_id, unit_id aren't always unique, so they don't represent a candidate key. To match the rows I take into account the row number, calculated for the rows sorted by their ids. So here's my second solution.

Test data:

-- the tables are the same as above
insert into transactions(id, doc_id, unit_id, amount)
values (1, 1, 1, 10), (2, 1, 1, 15), (3, 1, 3, 10);

insert into updated_transactions(id, doc_id, unit_id, amount)
values (6, 1, 1, 11), (7, 1, 1, 15), (8, 1, 4, 20); 

The merge query:

with trans as
(
    select id, doc_id, unit_id, amount,
        row_number() over(partition by doc_id, unit_id order by id) row_num
    from transactions
),
updated_trans as
(
    select id, doc_id, unit_id, amount,
        row_number() over(partition by doc_id, unit_id order by id) row_num
    from updated_transactions
),
new_values as
(
    select ut.id new_id, t.id old_id, ut.doc_id, ut.unit_id, ut.amount 
    from updated_trans ut
    left join trans t 
        on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id and t.row_num = ut.row_num
),
updated as
(
    update transactions tr
    set id = nv.new_id, amount = nv.amount
    from new_values nv
    where id = nv.old_id
    returning tr.*
)
insert into transactions(id, doc_id, unit_id, amount)
select ut.new_id, ut.doc_id, ut.unit_id, ut.amount
from new_values ut
where ut.new_id not in (select id from updated);

The results:

select * from transactions;

-- id | doc_id | unit_id | amount
------+--------+---------+-------
--  3 |   1    |    3    | 10     -- not changed
--  6 |   1    |    1    | 11     -- updated
--  7 |   1    |    1    | 15     -- updated
--  8 |   1    |    4    | 20     -- inserted

References:

Community
  • 1
  • 1
yallie
  • 2,200
  • 1
  • 28
  • 26
  • This would be the correct solution if only `on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id` would address a candidate/natural key. (which it isn't, see the comments) – joop Mar 30 '17 at 16:36
  • Yes, my real solution is more complex because I need to address possible duplicates, as mentioned in the comments. But that wasn't part of the original question. Perhaps I should also include the complex query in my answer. – yallie Mar 30 '17 at 17:29
  • I've added the second query that takes into account the duplicates, as discussed in the comments. Thanks for the feedback joop. – yallie Mar 30 '17 at 17:54