I need to insert a lot of data inside a PostgreSQL database. These data are stored into a CSV file.
I've created a python script for inserting data (they can be inserted directly because I need to manipulate data first).
At some point I've something like this
connection = psycopg2.connect(connection data...)
cursor = connection.cursor()
for row in rows:
# calculate values from row, then...
try:
query = "insert into public.my_table (x, y, z) values (" + x + ", " + y + ", " + z, ");"
cursor.execute(query)
except:
manage_error()
finally:
cursor.commit()
This code works. The problem is that it commits every insert
so I want to speed up things by commit changes only after the entire csv is loaded (many thousands of rows).
I've tried:
connection = psycopg2.connect(connection data...)
cursor = connection.cursor()
for row in rows:
# calculate values from row, then...
try:
query = "insert into public.my_table (x, y, z) values (" + x + ", " + y + ", " + z, ");"
cursor.execute(query)
except:
manage_error()
# Commit after inserting all rows
cursor.commit()
It almost works.
The problem is that I've some row in csv files that have some error giving me a duplicated key. This way the exception is raised. Even if I continue to the next row, PostgreSQL aborts the entire transaction so at the end I don't commit any row of the file.
I need to find a way to use the transaction and ignore errors inside the transaction itself, so I can ignore rows with error (after handled them in the exception) and continue to import all the valid lines of the csv.
I've about 20 rows with error in csv files with about 800.000 rows, so aborting the transaction for these few data is a pain, and from the number of rows you can imagine that commit them one a time is quite slow. Think also that I've a very big amount of these files to import.
I've seen the ON_ERROR_ROLLBACK
feature that can solve my problem, but unfortunately I cannot use it since it's a feature of psql
client, and I don't use it because I use the python script with psycopg2
library.
So, how can I ignore error in transaction without using ON_ERROR_ROLLBACK
, since I cannot use it?