Which SQL pattern is faster to avoid inserting duplicate rows?

Question

I know of two ways to insert without duplication. The first is using a WHERE NOT EXISTS clause:

INSERT INTO table_name (col1, col2, col3)
SELECT %s, %s, %s
WHERE NOT EXISTS (
    SELECT * FROM table_name AS T
    WHERE T.col1 = %s
      AND T.col2 = %s)

the other is doing a LEFT JOIN:

INSERT INTO table_name (col1, col2, col3)
SELECT %s, %s, %s
FROM ( SELECT %s, %s, %s ) A
LEFT JOIN table_name B
ON  B.COL1 = %s
AND B.COL2 = %s
WHERE B.id IS NULL
LIMIT 1

Is there a general rule as to one being faster than the other, or does it depend on the tables? Is there a different way which is better than both?

Are you sure you want a SELECT * in the first example? Select all the columns? Generally, one wouuldn't do that, but I'm no Postgres expert. — DOK, Jun 25 '10 at 18:05
Can you run both queries with EXPLAIN and post that? The analysis tools are there to help answer these questions on a case by case basis. — Freiheit, Jun 25 '10 at 18:05

Bill Karwin · Answer 1 · 2010-06-25T19:07:52.283

5

I would recommend defining a UNIQUE constraint on the column(s) you need to be unique (col1 & col2 in this case), and then just do the INSERT. Handle exceptions as needed.

Re your comment about the exception demanding a rollback, the solution for PostgreSQL is to set a transaction savepoint before you try the insert that may cause an exception. If you get the exception, rollback to the savepoint.

See:

edited Jun 25 '10 at 19:07

answered Jun 25 '10 at 18:05

Bill Karwin

538,548
86
673
828

i have the unique constraints in place already. however, when they're violated, i have to either commit or rollback the entire transaction so far. each transaction of mine requires multiple inserts, though, and i don't want to partially commit some of them. is there any way to handle exceptions in a better way, that doesn't mess up the cursor/transaction? (doing this from Python, with sqlobject or psycopg2) – Claudiu Jun 25 '10 at 18:21
Yes: *catch the exception*. If it's a duplicate key violation, then you can ignore it because it means your row already exists. If it's another type of exception, you would have had that problem anyway (e.g. SQL syntax error, disk full, network connection broken). – Bill Karwin Jun 25 '10 at 18:34
if i catch the exception, then attempt to execute another query with the same cursor, i get a `psycopg2.InternalError: current transaction aborted; ignoring actions until end of transaction block`, or something of the sort – Claudiu Jun 25 '10 at 18:41
Here is the exact error message: `psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block`. what can i do to prevent this state after failing to insert a table? – Claudiu Jun 25 '10 at 18:52
i've tried to ask the question more eloquently here: http://stackoverflow.com/questions/3120688/sql-continue-executing-queries-after-duplicate-key-violation – Claudiu Jun 25 '10 at 18:56

score -1 · Answer 2 · answered Jul 27 '10 at 07:08

I think using EXISTS is more efficient!You could do like this:

if exists(select 1 from table_name where col1 = %s and col2 = %s) then
  insert into table_name (col1, col2, col3)
  select %s, %s, %s;
end if;

under test,using EXISTS is about 50 times faster then using NOT EXISTS.

another method is using EXCEPT .

INSERT INTO table_name (col1, col2, col3)
SELECT %s, %s, %s
except
select col1, col2, col3 from table_name

under test,using EXCEPT is about 3 times faster then using NOT EXISTS.

Which SQL pattern is faster to avoid inserting duplicate rows?

2 Answers2

Linked