How to "merge" two tables efficiently in SQL?

Question

I have two tables (e.g. 'foo' and 'bar') with identical structure and a primary key (e.g. 'a'):

CREATE TABLE foo(a INT PRIMARY KEY, b VARCHAR(10));
INSERT INTO foo(a, b) VALUES (1, 'foo'), (2, 'foo');
CREATE TABLE bar(a INT PRIMARY KEY, b VARCHAR(10));
INSERT INTO bar(a, b) VALUES (2, 'bar'), (3, 'bar');

Now I want to update the rows of the table 'foo' with the values of the rows in the table 'bar' where the primary key matches and I want to insert the rows from the table 'bar' into the table 'foo' if the primary key doesn't already exist in table 'foo':

UPDATE foo SET b = bar.b FROM bar WHERE foo.a = bar.a;
INSERT INTO foo SELECT bar.* FROM bar LEFT JOIN foo USING (a) WHERE foo.a IS NULL;

This does what I want but I wonder if there is a more efficient way to do this?

Yes; I just found that from the [documentation](http://www.postgresql.org/docs/9.3/static/unsupported-features-sql-standard.html). It explicitly lists F312 MERGE statement (and F313 and F314, embellishments to MERGE) as not supported in PostgreSQL 9.3.5. In the absence of MERGE, I doubt if there's a significantly better way to do it than what you're doing already, surrounded by a transaction or savepoint. — Jonathan Leffler, Aug 24 '14 at 19:17
How did you get to this situation? You may want to review how / wy you have two tables that are out of sync — ErstwhileIII, Aug 24 '14 at 20:42
http://dba.stackexchange.com/questions/13468/most-idiomatic-way-to-implement-upsert-in-postgresql-nowadays — Gordon Linoff, Aug 25 '14 at 01:17

score 2 · Accepted Answer · edited May 23 '17 at 12:02

If your tables are not accessed by others concurrently (yet), you could use a FULL [OUTER] JOIN query to create a new, merged table, with values from bar getting priority.

If you have concurrent access, but can afford to lock both tables, that works, too:

BEGIN;
LOCK foo, bar;                     -- if you need it

CREATE TABLE baz AS
SELECT a, COALESCE(b.b, a.b) AS b  -- bar gets priority
FROM   foo f
FULL   JOIN bar b USING (a)
ORDER  BY a;                       -- optional order by

-- you need table name foo?
DROP  TABLE foo, bar;
ALTER TABLE baz RENAME TO foo;
ALTER TABLE foo ADD CONSTRAINT foo_a_pkey PRIMARY KEY (a);
-- do more?

COMMIT;

If you have substantial overlap, it's more efficient to write a new, pristine (clustered) table without dead rows, than to update much of the old. If the overlap is not big, update / insert may be more efficient. If both tables are small, don't bother and go with the simple solution.

The new table does not have any of the indexes or constraint of the old ones obviously. Recreate what you need.

If you have a lot of depending objects (views, functions), you may want to keep your old table. Create a temp table instead, TRUNCATE foo and write back data into the same table. This also won't kill concurrent transactions waiting for the existing table.

BEGIN;
LOCK foo, bar;                     -- if you need it

SET temp_buffers = 500MB;          -- if you need it

CREATE TEMP TABLE tmp AS
SELECT a, COALESCE(b.b, a.b) AS b  -- bar gets priority
FROM   foo f
FULL   JOIN bar b USING (a);

-- for performance, you might want to drop indexes and constraints here ...
TRUNCATE foo;
INSERT INTO foo
SELECT * FROM tmp
ORDER BY a;                         -- optional

DROP  TABLE  bar;                   -- optional
-- ... and recreate previously dropped indexes and constraints here

COMMIT;

About temp_buffers:

How to update selected rows with values from a CSV file in Postgres?

What happens to concurrent writes?

Best way to populate a new column in a large table?

Thanks for the comprehensive answer, but how is this more efficient or simpler or in any other way better than what I am already doing with the INSERT and UPDATE statements? — Christian Schlichtherle, Sep 04 '14 at 09:34
@ChristianSchlichtherle: The result is the same, logically (as it must be). But performance is superior (depending on circumstances as described) and you get a pristine (optionally "clustered") table without dead rows in indexes and table. Especially for big tables and / or lots of constraints / indexes on the table. — Erwin Brandstetter, Sep 04 '14 at 17:38

How to "merge" two tables efficiently in SQL?

1 Answers1