I need to upload multiple excel files to a postgresql table but they can olverlap each other in several registers, therefore I need to be aware of IntegrityErrors
. I'm following two approaches:
cursor.copy_from
: The fastest approach but I don't know how to catch and control all Integrityerrors
due to duplicate registers
streamCSV = StringIO()
streamCSV.write(invoicing_info.to_csv(index=None, header=None, sep=';'))
streamCSV.seek(0)
with conn.cursor() as c:
c.copy_from(streamCSV, "staging.table_name", columns=dataframe.columns, sep=';')
conn.commit()
cursor.execute
: I can count and handle each exception but it is very
slow.
data = invoicing_info.to_dict(orient='records')
with cursor as c:
for entry in data:
try:
c.execute(DLL_INSERT, entry)
successful_inserts += 1
connection.commit()
print('Successful insert. Operation number {}'.format(successful_inserts))
except psycopg2.IntegrityError as duplicate:
duplicate_registers += 1
connection.rollback()
print('Duplicate entry. Operation number {}'.format(duplicate_registers))
At the end of the routine, I need to determine the following info:
print("Initial shape: {}".format(invoicing_info.shape))
print("Successful inserts: {}".format(successful_inserts))
print("Duplicate entries: {}".format(duplicate_registers))
How can I modify the first approach to control all exceptions? How can I optimize the second approach?