First just for completeness I changed Erwin's code to use except
CREATE TEMP TABLE tmp AS SELECT * FROM tbl LIMIT 0 -- copy layout, but no data
COPY tmp FROM '/absolute/path/to/file' FORMAT csv;
INSERT INTO tbl
SELECT tmp.*
FROM tmp
except
select *
from tbl
DROP TABLE tmp;
Then I resolved to test it myself. I tested it in 9.1 with a mostly untouched postgresql.conf
. The target table contains 10 million rows and the origin table 30 thousand. 15 thousand already exists in the target table.
create table tbl (id integer primary key)
;
insert into tbl
select generate_series(1, 10000000)
;
create temp table tmp as select * from tbl limit 0
;
insert into tmp
select generate_series(9985000, 10015000)
;
I asked for the explain of the select part only. The except
version:
explain
select *
from tmp
except
select *
from tbl
;
QUERY PLAN
----------------------------------------------------------------------------------------
HashSetOp Except (cost=0.00..270098.68 rows=200 width=4)
-> Append (cost=0.00..245018.94 rows=10031897 width=4)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..771.40 rows=31920 width=4)
-> Seq Scan on tmp (cost=0.00..452.20 rows=31920 width=4)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=4)
-> Seq Scan on tbl (cost=0.00..144247.77 rows=9999977 width=4)
(6 rows)
The outer join
version:
explain
select *
from
tmp
left join
tbl using (id)
where tbl.id is null
;
QUERY PLAN
--------------------------------------------------------------------------
Nested Loop Anti Join (cost=0.00..208142.58 rows=15960 width=4)
-> Seq Scan on tmp (cost=0.00..452.20 rows=31920 width=4)
-> Index Scan using tbl_pkey on tbl (cost=0.00..7.80 rows=1 width=4)
Index Cond: (tmp.id = id)
(4 rows)