2

I have a situation where I want to insert a row if it doesn't exist, and to not insert it if it already does. I tried creating sql queries that prevented this from happening (see here), but I was told a solution is to create constraints and catch the exception when they're violated.

I have constraints in place already. My question is - how can I catch the exception and continue executing more queries? If my code looks like this:

cur = transaction.cursor()

#execute some queries that succeed

try:
    cur.execute(fooquery, bardata)  #this query might fail, but that''s OK
except psycopg2.IntegrityError:
    pass

cur.execute(fooquery2, bardata2)

Then I get an error on the second execute:

psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

How can I tell the computer that I want it to keep executing queries? I don't want to transaction.commit(), because I might want to roll back the entire transaction (the queries that succeeded before).

RF1991
  • 2,037
  • 4
  • 8
  • 17
Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • 1
    I would have recommended your first approach. I think it is far less work for the database to only insert rows that don't exist. Catching errors requires doing the work and then un-doing it, which is approximately twice as much work. If you are only adding one record at a time, I get that idea, but for multiple records, or thousands of records, I think you were better off the first way. That being said, I have no useful answer to your current question. – MJB Jun 25 '10 at 19:01
  • @MJB: It only results in more work when an error occurs. If the insert goes through with no error, then it's smooth sailing. So it depends on whether you expect it will be more common to get a duplicate key violation or not to get a duplicate key violation. Optimize for the more common case. – Bill Karwin Jun 25 '10 at 19:10
  • i expect to have duplicate key violations more often than not, so I guess that answers my question! – Claudiu Jun 25 '10 at 19:13
  • @Bill: I agree with planning for the more common approach. I only commented because it seemed that information was not yet available from the OP. It appears it now is. – MJB Jun 25 '10 at 19:48

2 Answers2

4

I think what you could do is use a SAVEPOINT before trying to execute the statement which could cause the violation. If the violation happens, then you could rollback to the SAVEPOINT, but keep your original transaction.

Here's another thread which may be helpful: Continuing a transaction after primary key violation error

Community
  • 1
  • 1
dcp
  • 54,410
  • 22
  • 144
  • 164
1

I gave an up-vote to the SAVEPOINT answer--especially since it links to a question where my answer was accepted. ;)

However, given your statement in the comments section that you expect errors "more often than not," may I suggest another alternative?

This solution actually harkens back to your other question. The difference here is how to load the data very quickly into the right place and format in order to move data around a single SELECT -and- is generic for any table you want to populate (so the same code could be used for multiple different tables). Here's a rough layout of how I would do it in pure PostgreSQL, assuming I had a CSV file in the same format of the table to be inserted into:

CREATE TEMP TABLE input_file (LIKE target_table);

COPY input_file FROM '/path/to/file.csv' WITH CSV;

INSERT INTO target_table
SELECT * FROM input_file
WHERE (<unique key field list>) NOT IN (
    SELECT <unique key field list>
    FROM target_table
);

Okay, this is a idealized example and I'm also glossing over several things (like reporting back the duplicates, pushing the data into the table via Python in-memory data, COPY from STDIN rather than via a file, etc.), but hopefully the basic idea is there and it's going to avoid much of the overhead if you expect more records to be rejected than accepted.

Matthew Wood
  • 16,017
  • 5
  • 46
  • 35