21

Using Python 2.7 and

In [150]: psycopg2.version Out[150]: '2.4.2 (dt dec pq3 ext)'

I have a simple python scripts that processing transactions and writes data to a database. Occasionally there is an insert that violates my primary key. This is fine, i just want it to ignore that record and continue on it merry way. The problem I am having is that psycopg2 primary key error is aborting the entire transaction block and all inserts after the error fail. Here is an example error

ERROR: duplicate key value violates unique constraint "encounter_id_pkey"
DETAIL:  Key (encounter_id)=(9012235) already exists.

This is on the next insert. not a violation.

Inserting: 0163168~9024065
ERROR: current transaction is aborted, commands ignored until end of transaction block

The Second error repeats itself for every insert. Here is a simplified loop. I am looping through a pandas data frame, but it could be any loop.

conn = psycopg2.connect("dbname='XXXX' user='XXXXX' host='XXXX' password='XXXXX'")

cur = conn.cursor()

for i, val in df2.iteritems():
    try:
        cur = conn.cursor()
        cur.execute("""insert into encounter_id_table (
        encounter_id,current_date  )
        values       
        (%(create_date)s, %(encounter_id)s ) ;""",
        'encounter_id':i.split('~')[1],  
        'create_date': datetime.date.today() })           
        cur.commit()
        cur.close() 
    except Exception , e:
        print 'ERROR:', e[0]
        cur.close()
 conn.close()   

Again the basic idea is to gracefully handle the Error. In the dictum of Admiral Nelson of the Royal Navy: "Damn the maneuvers go straight at them". Or in our case damn the Errors go straight at them." I thought by opening a cursor on every insert that I would be resetting the transaction block. I do not want to have to reset the connection just because of a primary key error. Is there something i am just missing?

Thanks before hand for your time.

John

jdennison
  • 2,000
  • 2
  • 15
  • 22

2 Answers2

26

You should rollback transaction on error.

I've added one more try..except..else construction in the code bellow to show the exact place where exception will occur.

try:
    cur = conn.cursor()

    try:
        cur.execute("""insert into encounter_id_table (
            encounter_id,current_date  )
            values       
            (%(create_date)s, %(encounter_id)s ) ;""",
            'encounter_id':i.split('~')[1],  
            'create_date': datetime.date.today() })
    except psycopg2.IntegrityError:
        conn.rollback()
    else:
        conn.commit()

    cur.close() 
except Exception , e:
    print 'ERROR:', e[0]
lig
  • 3,567
  • 1
  • 24
  • 36
  • This is is almost the answer. But I need to call the rollback() method on the connection object ( conn) here. The cursor only has commit not rollback. Thanks – jdennison Dec 14 '11 at 00:45
  • More method errors. You have to call commit on the connection not the cursor. – jdennison Dec 14 '11 at 01:16
  • I was wondering that commit and rollback is connection methods. Think that somebody should edit my code to correct it. Sadly i'm not very familiar with direct `psycopg` use as i'm using ORM or non-relational databases usually. – lig Dec 14 '11 at 08:39
  • If you have to do the above, you'd rather want the connection in autocommit, and just ignore errors. – piro Dec 19 '11 at 16:46
  • FWIW, you can call `cur.execute("COMMIT;")` or `cur.execute("ROLLBACK;")` instead of calling the methods on the connection object. – Fake Name Sep 22 '14 at 05:25
3

First of all: CURRENT_DATE is a reserved word in every SQL standard as well as in PostgreSQL. You cannot use it as identifier without double-quoting it. I'd rather avoid that, so I use the column name curdate in my example.

Next, you seem to have reversed the order of your insert-columns:

(%(create_date)s, %(encounter_id)s )

Should be:

( %(encounter_id)s, %(create_date)s)

To your main question: Avoid the problem altogether!

Since Postgres 9.5 INSERT ... ON CONFLICT DO NOTHING is the way to go. This also avoids race conditions under concurrent write load:

INSERT INTO encounter_id_table (encounter_id, curdate)
VALUES (1234, CURRENT_DATE)
ON CONFLICT DO NOTHING;

See:

In Python syntax, that should be:

cur.execute("""INSERT INTO encounter_id_table (encounter_id, curdate)
    VALUES (%(encounter_id)s, CURRENT_DATE);""",
  {'encounter_id':i.split('~')[1]})       

In older versions you can mostly avoid the problem by checking if a key is already in the table before inserting it:

INSERT INTO encounter_id_table (encounter_id, curdate)
SELECT 1234, now()::date
WHERE  NOT EXISTS (SELECT FROM encounter_id_table t
                   WHERE t.encounter_id = 1234);

Can fail under concurrent write load, though.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Unless you specifically set the isolation level to something more strict, using the WHERE NOT EXISTS syntax will still create a race condition. – lusional Oct 11 '16 at 18:45
  • 1
    @lusional: In Postgres 9.5 or later, you best use `INSERT .. ON CONFLICT DO NOTHING`. Compare: http://stackoverflow.com/questions/17267417/how-to-upsert-merge-insert-on-duplicate-update-in-postgresql. Changing the isolation level to `Serializable` would be way more expensive. – Erwin Brandstetter Oct 20 '16 at 03:17
  • Agreed - using the INSERT .. ON CONFLICT syntax is much better where possible. Race conditions are nasty when not dealt with properly. – lusional Dec 15 '16 at 21:25