1

I have a pandas DataFrame that consists of multiple columns that I want to store into the postgreSQL database, using .to_sql():

my_table.to_sql('table', con=engine, schema='wrhouse', if_exists='append', index=False)

I have set a primary key (date), in order to avoid duplicate entries. So above-mentioned command works when my primary key does not exist in the database.

However, if that key exists I am getting the following error:

IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "table_pkey"
DETAIL:  Key (date)=(2022-07-01 00:00:00) already exists.

Now, what I would like to do is:

  • Update the row with the already existed Key(date)
  • Insert a new row in case the Key(date) does not exist

I checked the documentation on: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html but I could't find any option by using the DataFrame.to_sql() function.

Additionally, if I change the if_exists='append' parameter to if_exists='replace', it deletes the whole table and that is not what I want.

Is there any way to update/insert rows using the .to_sql() function?

Pythoneer
  • 143
  • 8
  • 1
    Use an [UPSERT](https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT) (`INSERT .. ON CONFLICT ... DO UPDATE`). Not sure how to dumb that down for your ORM. (People should just use SQL instead of the 1000 different, mostly half-broken obfuscation layers.) – Erwin Brandstetter Nov 14 '22 at 14:01
  • @ErwinBrandstetter This line of code works, when I am explicitly typing the VALUES: `engine = conn.execute("INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES ('2022-07-01 00:00:00', 15, 17, '250') ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;")` But what would be the syntax if I want to insert the values from the DataFrame? – Pythoneer Nov 14 '22 at 17:15
  • 1
    Don't know Python / Pandas well enough. This related question should help: https://stackoverflow.com/q/61366664/939860 – Erwin Brandstetter Nov 14 '22 at 18:18
  • I have the same question. It seems to me that the 'update' a row function does not exist for pandas to_sql (it drops the whole table or appends to it). So the only option is to do this directly on SQL? – edgarbc Feb 17 '23 at 15:37

1 Answers1

1

you could convert the my_table dataframe (which holds new values you'd like to send to the table in the database) to a numpy record array and add it to the query used in the execute function in your comment ^:

values = str(list(my_table.to_records(index=False)))[1:-1]

conn.execute(f"INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES {values} ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;")

(this is something that worked for me, hope it helps!)

ela
  • 21
  • 3