I have a pandas DataFrame that I need to store into the database. Here's my current line of code for inserting:
df.to_sql(table,con=engine,if_exists='append',index_label=index_col)
This works fine if none of the rows in df
exist in my table. If a row already exists, I get this error:
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key
value violates unique constraint "mypk"
DETAIL: Key (id)=(42) already exists.
[SQL: 'INSERT INTO mytable (id, owner,...) VALUES (%(id)s, %(owner)s,...']
[parameters:...] (Background on this error at: http://sqlalche.me/e/gkpj)
and nothing is inserted.
PostgreSQL has optional ON CONFLICT
clause, which could be used to UPDATE
the existing table rows. I read entire pandas.DataFrame.to_sql manual page and I couldn't find any way to use ON CONFLICT
within DataFrame.to_sql()
function.
I have considered spliting my DataFrame in two based on what's already in the db table. So now I have two DataFrames, insert_rows
and update_rows
, and I can safely execute
insert_rows.to_sql(table, con=engine, if_exists='append', index_label=index_col)
But then, there seems to be no UPDATE
equivalent to DataFrame.to_sql()
. So how do I update the table using DataFrame update_rows
?