786

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:

INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);

I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.

To clarify, I want to insert several things and if they already exist to update them.

Braiam
  • 1
  • 11
  • 47
  • 78
Teifion
  • 108,121
  • 75
  • 161
  • 195
  • 46
    Anybody who finds this question should read Depesz's article ["Why is upsert so complicated?"](http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/). It explains the issue and possible solutions extremely well. – Craig Ringer Oct 14 '12 at 01:22
  • 8
    UPSERT will be added in Postgres 9.5: https://wiki.postgresql.org/wiki/UPSERT#MySQL:_INSERT_..._ON_DUPLICATE_KEY_UPDATE – tommed Jun 15 '15 at 12:53
  • 5
    @tommed - it has been done: http://stackoverflow.com/a/34639631/4418 – warren Jan 06 '16 at 17:59

18 Answers18

729

PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)

INSERT INTO the_table (id, column_1, column_2) 
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE 
  SET column_1 = excluded.column_1, 
      column_2 = excluded.column_2;

Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:

Example 38-2. Exceptions with UPDATE/INSERT

This example uses exception handling to perform either UPDATE or INSERT, as appropriate:

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);

CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
        -- first try to update the key
        -- note that "a" must be unique
        UPDATE db SET b = data WHERE a = key;
        IF found THEN
            RETURN;
        END IF;
        -- not there, so try to insert the key
        -- if someone else inserts the same key concurrently,
        -- we could get a unique-key failure
        BEGIN
            INSERT INTO db(a,b) VALUES (key, data);
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- do nothing, and loop to try the UPDATE again
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');

There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:

WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;

See a_horse_with_no_name's answer for a clearer example.

Community
  • 1
  • 1
Stephen Denne
  • 36,219
  • 10
  • 45
  • 60
  • 8
    The only thing I don't like about this is that it would be much slower, because each upsert would be its' own individual call into the database. – baash05 Mar 01 '12 at 00:27
  • @baash05 there might be a way to do it in bulk, see my updated answer. – Stephen Denne Mar 02 '12 at 00:02
  • 2
    The only thing I'd do differently is to use FOR 1..2 LOOP instead of just LOOP so that if some other unique constraint is violated it won't spin indefinitely. – olamork Dec 05 '13 at 22:32
  • hey @olamork, want to submit a pull request implementing that? https://github.com/seamusabshere/upsert/blob/master/lib/upsert/merge_function/postgresql.rb#L112 – Seamus Abshere Jan 29 '14 at 20:33
  • @d11wtq AIUI mysql does it using table locks. – Jasen Jan 10 '15 at 03:32
  • @d11wtq you are right, but hopefully [that will be changing soon](https://wiki.postgresql.org/wiki/UPSERT#Revisions). –  Jan 17 '15 at 14:58
  • Possible problem with 'CTEs in 9.1' solution, which uses `INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)` is that you can not easily get updated row using 'RETURNING' statement. In case of insert it will return inserted row, however in case of update it wont. – antiplayer Dec 01 '17 at 11:21
  • Another problem is that you will have to include all the `NOT NULL` columns to the `INSERT` statement, even though you have no intention to update those columns, otherwise, you'll get a constraint violation error. – phil pirozhkov Nov 07 '18 at 09:35
  • 14
    What does the `excluded` refer to in the first solution here? – ichbinallen Dec 21 '18 at 20:24
  • 10
    @ichbinallen [in the docs](https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT) _The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table_. In this case, the special `excluded` table gives you access to the values you were trying to INSERT in the first place. – TMichel Nov 22 '19 at 10:34
  • 1
    In my case I wanted to update/insert a live table from a temp one. As its a common use case I thought I'd include it. `INSERT INTO table1 SELECT * FROM table2 ON CONFLICT ("field1") DO UPDATE SET "field1"=EXCLUDED."field1", "field2"=EXCLUDED."field2";` – koax26 Jun 26 '20 at 01:46
  • Does ON CONFLICT assume or require that the column is/be unique? What could be an alternative to deleting existing non-unique values and then re-inserting them? – TheRealChx101 Aug 29 '21 at 16:00
451

Warning: this is not safe if executed from multiple sessions at the same time (see caveats below).


Another clever way to do an "UPSERT" in postgresql is to do two sequential UPDATE/INSERT statements that are each designed to succeed or have no effect.

UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
       SELECT 3, 'C', 'Z'
       WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);

The UPDATE will succeed if a row with "id=3" already exists, otherwise it has no effect.

The INSERT will succeed only if row with "id=3" does not already exist.

You can combine these two into a single string and run them both with a single SQL statement execute from your application. Running them together in a single transaction is highly recommended.

This works very well when run in isolation or on a locked table, but is subject to race conditions that mean it might still fail with duplicate key error if a row is inserted concurrently, or might terminate with no row inserted when a row is deleted concurrently. A SERIALIZABLE transaction on PostgreSQL 9.1 or higher will handle it reliably at the cost of a very high serialization failure rate, meaning you'll have to retry a lot. See why is upsert so complicated, which discusses this case in more detail.

This approach is also subject to lost updates in read committed isolation unless the application checks the affected row counts and verifies that either the insert or the update affected a row.

Community
  • 1
  • 1
bovine
  • 5,303
  • 2
  • 18
  • 17
  • Question, does the INSERT fail if the record exists? or does it insert a blank record? Would this also work if I'm not using the id (pk) and just another field that's unique? – Phill Pafford Oct 12 '11 at 14:19
  • 8
    Short answer: if the record exists the INSERT does nothing. Long answer: the SELECT in the INSERT will return as many results as there are matches of the where clause. That is at most one (if the number one is not in the result of the sub-select), otherwise zero. The INSERT will thus add either one or zero rows. – Peter Becker Oct 15 '11 at 10:34
  • 3
    the 'where' part can be simplified by using exists: `... where not exists (select 1 from table where id = 3);` – Endy Tjahjono Oct 21 '11 at 06:46
  • 1
    this should be the right answer.. with some minor tweaks, it could be used to do a mass update.. Humm.. I wonder if a temp table could be used.. – baash05 Mar 01 '12 at 00:03
  • I heard this INSERT syntax is only supported in postgres version 9.1 or greater. Can someone confirm this? – keaplogik Feb 27 '13 at 18:49
  • 1
    @keaplogik, that 9.1 limitation is with writable CTE (common table expressions) that is described in another of the answers. The syntax used in this answer is very basic and has been long supported. – bovine Mar 06 '13 at 19:58
  • 10
    Warning, this is subject to lost updates in `read committed` isolation unless your application checks to make sure that the `insert` or the `update` have a non-zero rowcount. See http://dba.stackexchange.com/q/78510/7788 – Craig Ringer Oct 07 '14 at 07:16
  • Thanks, this should tide me over until Postgres 9.5! http://www.craigkerstiens.com/2015/05/08/upsert-lands-in-postgres-9.5/ – Carl Jun 10 '15 at 14:49
  • I don't have such a deep understanding of Postgres... Is there no locking option or something to guarantee that the server runs these two commands consecutively? – sudo Jun 19 '15 at 18:40
  • It doesn't work if I don't even know what the id is, this is however a usual cases since many people rely on auto increment id. – Alston Apr 29 '17 at 09:18
  • `SELECT 3, 'C', 'Z'` How are you able to SELECT via row values instead of column names...? – insivika May 11 '20 at 17:01
  • Will it possible do something like `ON CONFLICT (id, column_1)`? I mean only if both id & column_1 are same with current insert, then will fall over to update. – atline May 20 '20 at 07:16
238

With PostgreSQL 9.1 this can be achieved using a writeable CTE (common table expression):

WITH new_values (id, field1, field2) as (
  values 
     (1, 'A', 'X'),
     (2, 'B', 'Y'),
     (3, 'C', 'Z')

),
upsert as
( 
    update mytable m 
        set field1 = nv.field1,
            field2 = nv.field2
    FROM new_values nv
    WHERE m.id = nv.id
    RETURNING m.*
)
INSERT INTO mytable (id, field1, field2)
SELECT id, field1, field2
FROM new_values
WHERE NOT EXISTS (SELECT 1 
                  FROM upsert up 
                  WHERE up.id = new_values.id)

See these blog entries:


Note that this solution does not prevent a unique key violation but it is not vulnerable to lost updates.
See the follow up by Craig Ringer on dba.stackexchange.com

Community
  • 1
  • 1
  • Is this better than a stored procedure? – François Beausoleil Feb 21 '12 at 15:11
  • 1
    @FrançoisBeausoleil: the chance of a race condition is much smaller than with the "try/handle exception" approach –  Feb 21 '12 at 15:32
  • 2
    @a_horse_with_no_name How do you exactly mean that the chance on race conditions is much smaller? When I execute this query concurrently with the same records I'm getting the error "duplicate key value violates unique constraint" 100% of the times until the query detects that the record has been inserted. Is this a complete example? – Jeroen van Dijk Mar 26 '12 at 09:39
  • 5
    @a_horse_with_no_name Your solution seems to work in concurrent situations when you wrap the upsert statement with the following lock: BEGIN WORK; LOCK TABLE mytable IN SHARE ROW EXCLUSIVE MODE; ; COMMIT WORK; – Jeroen van Dijk Mar 26 '12 at 11:15
  • 2
    @JeroenvanDijk: thanks. What I meant with "much smaller" is that if several transactions to this (and commit the change!) the time span between the update and the insert is smaller as everything is just a single statement. You can always generate a pk violation by two independent INSERT statements. If you lock the whole table, you effectively serialize all access to it (something you could achieve with the serializable isolation level as well). –  Mar 26 '12 at 11:28
  • @a_horse_with_no_name I'm not very experienced with Postgres so I'm not sure what is more optimal for concurrent situations, but I believe I read on the PS mailinglist that locks are needed and would probably be the most efficient. I tried a few locks and this lock seems so to work perfectly even with several concurrent threads trying to insert the same data. The serializable isolation level you are talking about seems to be a less fine grained way to achieve this (http://www.postgresql.org/docs/8.4/static/transaction-iso.html), am I right? – Jeroen van Dijk Mar 28 '12 at 08:16
  • Please note that PostgreSQL doesn't want the `m.` prefix in the SET part of the UPDATE statement. We should have `update mytable m set field1 = nv.field1, field2 = nv.field2` – Ludovic Kuty Jul 09 '12 at 10:54
  • This won't work if you have duplicate `id`s in `new_values` (very possible if `id` is a foreign key) as the insert step doesn't keep track of all its inserted values. – David Murdoch Jul 31 '12 at 18:38
  • @DavidMurdoch: well it assumes that the `id` column is the primary key. Otherwise a regular update wouldn't work either. –  Jul 31 '12 at 19:29
  • I don't see how a regular update wouldn't work just because you don't specify the primary key? Or did I miss your point entirely? – David Murdoch Jul 31 '12 at 23:56
  • @DavidMurdoch: a regular update/insert (as e.g. done in the examples in the manual) would "not work" the same way as my statement would "not work". If `id` isn't the PK then it's going to update too many rows. –  Aug 01 '12 at 06:37
  • 1
    Got it, you could still have the fk be unique (or have a unique constraint over multiple columns, upon which you are updating/inserting). My main comment was directed at passers-by in the future who might have the same issue I had, btw. Thanks for this and +1. – David Murdoch Aug 01 '12 at 17:13
  • 1
    Is it possible to make this solution return a field from the row regardless of whether it was updated or inserted? – agnsaft Dec 28 '12 at 14:59
  • 14
    This solution is subject to lost updates if the inserting transaction rolls back; there's no check to enforce that the `UPDATE` affected any rows. – Craig Ringer Jun 24 '13 at 03:47
  • Hello horse... i follow a very long path of crumbs regarding this issue. But I was thinking what about a `BEFORE INSERT TRIGGER` check each row would be too expensive? – Juan Carlos Oropeza Jul 07 '16 at 21:51
  • @a_horse_with_no_name you saved my day and I will explain why. The "Upsert" from Postgres requires that the `ON CONFLICT` is run over a constraint name or **all** the columns in the primary key. But if you want to update on conflict of a subset of the columns of the primary key, it will throw an exception. This solution instead workd well. Thanks – JeanValjean Nov 02 '20 at 12:00
172

In PostgreSQL 9.5 and newer you can use INSERT ... ON CONFLICT UPDATE.

See the documentation.

A MySQL INSERT ... ON DUPLICATE KEY UPDATE can be directly rephrased to a ON CONFLICT UPDATE. Neither is SQL-standard syntax, they're both database-specific extensions. There are good reasons MERGE wasn't used for this, a new syntax wasn't created just for fun. (MySQL's syntax also has issues that mean it wasn't adopted directly).

e.g. given setup:

CREATE TABLE tablename (a integer primary key, b integer, c integer);
INSERT INTO tablename (a, b, c) values (1, 2, 3);

the MySQL query:

INSERT INTO tablename (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=c+1;

becomes:

INSERT INTO tablename (a, b, c) values (1, 2, 10)
ON CONFLICT (a) DO UPDATE SET c = tablename.c + 1;

Differences:

  • You must specify the column name (or unique constraint name) to use for the uniqueness check. That's the ON CONFLICT (columnname) DO

  • The keyword SET must be used, as if this was a normal UPDATE statement

It has some nice features too:

  • You can have a WHERE clause on your UPDATE (letting you effectively turn ON CONFLICT UPDATE into ON CONFLICT IGNORE for certain values)

  • The proposed-for-insertion values are available as the row-variable EXCLUDED, which has the same structure as the target table. You can get the original values in the table by using the table name. So in this case EXCLUDED.c will be 10 (because that's what we tried to insert) and "table".c will be 3 because that's the current value in the table. You can use either or both in the SET expressions and WHERE clause.

For background on upsert see How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • I have looked into PostgreSQL's 9.5 solution as you described above because I was experiencing gaps in the auto increment field while under MySQL's `ON DUPLICATE KEY UPDATE`. I have downloaded Postgres 9.5 and implemented your code but strangely the same issue occurs under Postgres: the primary key's serial field is not consecutive (there are gaps between the inserts and updates.). Any idea what is going on here? Is this normal? Any idea how to avoid this behavior? Thank you. – W.M. Aug 07 '16 at 10:47
  • @W.M. That's pretty much inherent to an upsert operation. You have to evaluate the function that generates the sequence before attempting the insert. Since such sequences are designed to operate concurrently they're exempt from normal transaction semantics, but even if they weren't the generation is not called in a subtransaction and rolled back, it completes normally and commits with the rest of the operation. So this would happen even with "gapless" sequence implementations. The only way the DB could avoid this would be to delay evaluation of sequence generation until after the key check. – Craig Ringer Aug 08 '16 at 01:15
  • 2
    @W.M. which would create its own problems. Basically, you're stuck. But if you're relying on serial / auto_increment being gapless you've already got bugs. You can have sequence gaps due to rollbacks including transient errors - reboots under load, client errors mid-transaction, crashes, etc. You must never, ever rely on `SERIAL` / `SEQUENCE` or `AUTO_INCREMENT` not having gaps. If you need gapless sequences they're more complex; you need to use a counter table usually. Google will tell you more. But be aware gapless sequences prevent all insert concurrency. – Craig Ringer Aug 08 '16 at 01:17
  • @W.M. If you absolutely do require gapless sequences and upsert, you could use the function-based upsert approach discussed in the manual along with a gapless sequence implementation that uses a counter table. Because the `BEGIN ... EXCEPTION ...` runs in a subtransaction that gets rolled back on error, your sequence increment would get rolled back if the `INSERT` failed. – Craig Ringer Aug 08 '16 at 01:18
  • Thank you very much @Craig Ringer, that was pretty informative. I realized that I can simply give up on having that auto increment primary key. I made a composite primary of 3 fields and for my particular current need, there is really no need for a gapless auto increment field. Thank you again, the information you provided would save me time in the future trying to prevent a natural and healthy DB behavior. I understand it better now. – W.M. Aug 08 '16 at 15:58
  • @W.M. Why are gaps in a serial column used as a primary key a problem? They are still unique, and they are still ascending, just not guaranteed consecutive. Why is gapless so important? A 3 column primary key might be OK depending on circumstances, but if these 3 columns are wide datatypes, and you have lots of non-clustered indexes then you are creating a lot of overhead in your system. – Davos Nov 06 '17 at 04:53
  • @Davos They usually aren't, but things like cheque numbers canhave business requirements – Craig Ringer Nov 06 '17 at 10:33
  • Will it possible do something like `ON CONFLICT (id, column_1)`? I mean only if both id & column_1 are same with current insert, then will fall over to update. – atline May 20 '20 at 07:17
18

I was looking for the same thing when I came here, but the lack of a generic "upsert" function botherd me a bit so I thought you could just pass the update and insert sql as arguments on that function form the manual

that would look like this:

CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
    RETURNS VOID
    LANGUAGE plpgsql
AS $$
BEGIN
    LOOP
        -- first try to update
        EXECUTE sql_update;
        -- check if the row is found
        IF FOUND THEN
            RETURN;
        END IF;
        -- not found so insert the row
        BEGIN
            EXECUTE sql_insert;
            RETURN;
            EXCEPTION WHEN unique_violation THEN
                -- do nothing and loop
        END;
    END LOOP;
END;
$$;

and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php

the highest cost is executing the query from your code, on the database side the execution cost is much smaller

peterh
  • 11,875
  • 18
  • 85
  • 108
Paul Scheltema
  • 1,993
  • 15
  • 25
  • 3
    You still have to run this in a retry loop and it's prone to races with a concurrent `DELETE` unless you lock the table or are in `SERIALIZABLE` transaction isolation on PostgreSQL 9.1 or greater. – Craig Ringer May 22 '13 at 10:00
14

There is no simple command to do it.

The most correct approach is to use function, like the one from docs.

Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them

Something along the lines of:

update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;

assuming id:2 was returned:

insert into table (id, column) values (1, 'aa'), (3, 'cc');

Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.

Here's a longer and more comprehensive article on the topic.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    If using this option, be sure to check that the id is returned even if the update does nothing. I've seen databases optimise-away queries like "Update table foo set bar = 4 where bar = 4". – thelem Jan 20 '12 at 14:58
11

I use this function merge

CREATE OR REPLACE FUNCTION merge_tabla(key INT, data TEXT)
  RETURNS void AS
$BODY$
BEGIN
    IF EXISTS(SELECT a FROM tabla WHERE a = key)
        THEN
            UPDATE tabla SET b = data WHERE a = key;
        RETURN;
    ELSE
        INSERT INTO tabla(a,b) VALUES (key, data);
        RETURN;
    END IF;
END;
$BODY$
LANGUAGE plpgsql
Mise
  • 3,267
  • 1
  • 22
  • 22
  • 2
    It is more efficient to simply do the `update` first and then check the number of updated rows. (See Ahmad's answer) –  Jan 09 '15 at 10:47
10

Personally, I've set up a "rule" attached to the insert statement. Say you had a "dns" table that recorded dns hits per customer on a per-time basis:

CREATE TABLE dns (
    "time" timestamp without time zone NOT NULL,
    customer_id integer NOT NULL,
    hits integer
);

You wanted to be able to re-insert rows with updated values, or create them if they didn't exist already. Keyed on the customer_id and the time. Something like this:

CREATE RULE replace_dns AS 
    ON INSERT TO dns 
    WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time") 
            AND (dns.customer_id = new.customer_id)))) 
    DO INSTEAD UPDATE dns 
        SET hits = new.hits 
        WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id));

Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.

However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.

Also, the COPY command does not use RULES, so if you're inserting with COPY, you'll need to use triggers instead.

Ch'marr
  • 1,284
  • 11
  • 8
8

Similar to most-liked answer, but works slightly faster:

WITH upsert AS (UPDATE spider_count SET tally=1 WHERE date='today' RETURNING *)
INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)

(source: http://www.the-art-of-web.com/sql/upsert/)

alexkovelsky
  • 3,880
  • 1
  • 27
  • 21
  • 5
    This will fail if run concurrently in two sessions, because neither update will see an existing row so both updates will hit zero rows, so both queries will issue an insert. – Craig Ringer May 08 '15 at 07:55
7

I custom "upsert" function above, if you want to INSERT AND REPLACE :

`

 CREATE OR REPLACE FUNCTION upsert(sql_insert text, sql_update text)

 RETURNS void AS
 $BODY$
 BEGIN
    -- first try to insert and after to update. Note : insert has pk and update not...

    EXECUTE sql_insert;
    RETURN;
    EXCEPTION WHEN unique_violation THEN
    EXECUTE sql_update; 
    IF FOUND THEN 
        RETURN; 
    END IF;
 END;
 $BODY$
 LANGUAGE plpgsql VOLATILE
 COST 100;
 ALTER FUNCTION upsert(text, text)
 OWNER TO postgres;`

And after to execute, do something like this :

SELECT upsert($$INSERT INTO ...$$,$$UPDATE... $$)

Is important to put double dollar-comma to avoid compiler errors

  • check the speed...
Felipe FMMobile
  • 1,641
  • 20
  • 17
6

According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.

Christian Hang-Hicks
  • 2,115
  • 1
  • 21
  • 20
  • @Lucian `MERGE` is also really more of an OLAP operation; see http://stackoverflow.com/q/17267417/398670 for explanation. It doesn't define concurrency semantics and most people who use it for upsert are just creating bugs. – Craig Ringer May 08 '15 at 07:56
6
CREATE OR REPLACE FUNCTION save_user(_id integer, _name character varying)
  RETURNS boolean AS
$BODY$
BEGIN
    UPDATE users SET name = _name WHERE id = _id;
    IF FOUND THEN
        RETURN true;
    END IF;
    BEGIN
        INSERT INTO users (id, name) VALUES (_id, _name);
    EXCEPTION WHEN OTHERS THEN
            UPDATE users SET name = _name WHERE id = _id;
        END;
    RETURN TRUE;
END;

$BODY$
  LANGUAGE plpgsql VOLATILE STRICT
Ahmad
  • 4,224
  • 8
  • 29
  • 40
6

I have the same issue for managing account settings as name value pairs. The design criteria is that different clients could have different settings sets.

My solution, similar to JWP is to bulk erase and replace, generating the merge record within your application.

This is pretty bulletproof, platform independent and since there are never more than about 20 settings per client, this is only 3 fairly low load db calls - probably the fastest method.

The alternative of updating individual rows - checking for exceptions then inserting - or some combination of is hideous code, slow and often breaks because (as mentioned above) non standard SQL exception handling changing from db to db - or even release to release.

 #This is pseudo-code - within the application:
 BEGIN TRANSACTION - get transaction lock
 SELECT all current name value pairs where id = $id into a hash record
 create a merge record from the current and update record
  (set intersection where shared keys in new win, and empty values in new are deleted).
 DELETE all name value pairs where id = $id
 COPY/INSERT merged records 
 END TRANSACTION
Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
benno
  • 69
  • 1
  • 2
  • Welcome to SO. Nice introduction! :-) – Don Question Nov 08 '12 at 23:36
  • 1
    This is more like `REPLACE INTO` than `INSERT INTO ... ON DUPLICATE KEY UPDATE`, which may cause a problem if you use triggers. You'll end up running delete and insert triggers/rules, rather than update ones. – cHao May 15 '14 at 16:48
5

For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org

The current best practice that I'm aware of is:

  1. COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
  2. Acquire Lock [optional] (advisory is preferable to table locks, IMO)
  3. Merge. (the fun part)
gd1
  • 11,300
  • 7
  • 49
  • 88
jwp
  • 407
  • 3
  • 10
4

UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.

This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write the ten line script in PL/PSQL, you probably should have a unit test of one or another kind just for it alone.

Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93
4

PostgreSQL >= v15

Big news on this topic as in PostgreSQL v15, it is possible to use MERGE command. In fact, this long awaited feature was listed the first of the improvements of the v15 release.

This is similar to INSERT ... ON CONFLICT but more batch-oriented. It has a powerful WHEN MATCHED vs WHEN NOT MATCHED structure that gives the ability to INSERT, UPDATE or DELETE on such conditions.

It not only eases bulk changes, but it even adds more control that tradition UPSERT and INSERT ... ON CONFLICT

Take a look at this very complete sample from official page:

MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
  INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
  UPDATE SET stock = w.stock + s.stock_delta
WHEN MATCHED THEN
  DELETE;

PostgreSQL v9, v10, v11, v12, v13, v14

If version is under v15 and over v9.5 , probably best choice is to use UPSERT syntax, with ON CONFLICT clause

plmk
  • 2,194
  • 1
  • 15
  • 21
3

Edit: This does not work as expected. Unlike the accepted answer, this produces unique key violations when two processes repeatedly call upsert_foo concurrently.

Eureka! I figured out a way to do it in one query: use UPDATE ... RETURNING to test if any rows were affected:

CREATE TABLE foo (k INT PRIMARY KEY, v TEXT);

CREATE FUNCTION update_foo(k INT, v TEXT)
RETURNS SETOF INT AS $$
    UPDATE foo SET v = $2 WHERE k = $1 RETURNING $1
$$ LANGUAGE sql;

CREATE FUNCTION upsert_foo(k INT, v TEXT)
RETURNS VOID AS $$
    INSERT INTO foo
        SELECT $1, $2
        WHERE NOT EXISTS (SELECT update_foo($1, $2))
$$ LANGUAGE sql;

The UPDATE has to be done in a separate procedure because, unfortunately, this is a syntax error:

... WHERE NOT EXISTS (UPDATE ...)

Now it works as desired:

SELECT upsert_foo(1, 'hi');
SELECT upsert_foo(1, 'bye');
SELECT upsert_foo(3, 'hi');
SELECT upsert_foo(3, 'bye');
Joey Adams
  • 41,996
  • 18
  • 86
  • 115
  • 1
    You can combine them into one statement if you use a writeable CTE. But like most solutions posted here, this one is wrong and will fail in the presence of concurrent updates. – Craig Ringer May 08 '15 at 07:54
0

Here is the example how to do upsert with params and without special sql constructions if you have special condition (sometimes you can't use 'on conflict' because you can't create constraint)

WITH upd AS
(
    update view_layer set metadata=:metadata where layer_id = :layer_id and view_id = :view_id returning id
)
insert into view_layer (layer_id, view_id, metadata)
(select :layer_id layer_id, :view_id view_id, :metadata metadata FROM view_layer l 
where NOT EXISTS(select id FROM upd WHERE id IS NOT NULL) limit 1)
returning id

maybe it will be helpful

SalientBrain
  • 2,431
  • 16
  • 18