26

I have a PostgreSQL db. Pandas has a 'to_sql' function to write the records of a dataframe into a database. But I haven't found any documentation on how to update an existing database row using pandas when im finished with the dataframe.

Currently I am able to read a database table into a dataframe using pandas read_sql_table. I then work with the data as necessary. However I haven't been able to figure out how to write that dataframe back into the database to update the original rows.

I dont want to have to overwrite the whole table. I just need to update the rows that were originally selected.

darkpool
  • 13,822
  • 16
  • 54
  • 89
  • 2
    Upon further research I still haven't found a solution. It seems you can set a flag with pandas to_sql(if_exists='append'). But there doesn't seem to be anything for if_exists='update'. I find this really strange. What is the suggested way to get a pandas dataframe back into a database and update any rows that have changed? Surely this is a common task? – darkpool Apr 14 '15 at 07:29
  • 1
    Are you adding or removing new rows to the dataframe? If you are not, I guess you know you could do queries. But even if you are adding or removing rows, there could be a simple way to perform it with queries, after dropping the primary key. If you give a bit more description, I think there is a chance for a relatively simple answer just using queries. – lrnzcig Apr 14 '15 at 14:23
  • 1
    `to_sql` does not support updates. The best approach I have found so far is to create an `ON INSERT` trigger in the database table, that updates all fields if inserting a duplicate primary key. – ostrokach Oct 01 '15 at 01:03
  • 1
    Are your data large? A blunt instrument solution could be: read in the database table again into some sort of `original_table` dataframe, call `original_table.update(modified_table)` where `modified_table` is your dataframe, and then `...to_sql(if_exists='replace')` with this new dataframe object. – James Nov 13 '15 at 15:41
  • 1
    This should work next year when postgres 9.5 is released which supports UPSERT! – Migwell Dec 26 '15 at 16:10
  • I am trying to do this too, it seems it is the one missing piece for common database manipulation in pandas. Could something like this be done with an H5 format? It seems that in the H5 format you need to delete the old row and append a new row as well but maybe there is a simple update method I missed in the docs. – derchambers Feb 19 '16 at 23:23

2 Answers2

5

One way is to make use of an sqlalchemy "table class" and session.merge(row), session.commit():

Here is an example:

for row in range(0, len(df)):
    row_data = table_class(column_1=df.ix[i]['column_name'],
                           column_2=df.ix[i]['column_name'],
                           ...
                           )
    session.merge(row_data)
    session.commit()
johan855
  • 1,578
  • 4
  • 26
  • 51
0

For sql alchemy case of read table as df, change df, then update table values based on df, I found the df.to_sql to work with name=<table_name> index=False if_exists='replace'

This should replace the old values in the table with the ones you changed in the df

Eurofut
  • 1
  • 2