16

I want to append the Pandas dataframe to an existing table in a sqlite database called 'NewTable'. NewTable has three fields (ID, Name, Age) and ID is the primary key. My database connection:

import sqlite3
DB='<path>'
conn = sqlite3.connect(DB)  

The dataframe I want to append:

test=pd.DataFrame(columns=['ID','Name','Age'])
test.loc[0,:]='L1','John',17  
test.loc[1,:]='L11','Joe',30  

As mentioned above, ID is the primary key in NewTable. The key 'L1' is already in NewTable, but key 'L11' is not. I try to append the dataframe to NewTable.

from pandas.io import sql 
sql.write_frame(test,name='NewTable',con=conn,if_exists='append')

This throws an error:

IntegrityError: column ID is not unique

The error is likely to the fact that key 'L1' is already in NewTable. Neither of the entries in the dataframe are appended to NewTable. But, I can append dataframes with new keys to NewTable without problem.

Is there a simple way (e.g., without looping) to append Pandas dataframes to a sqlite table such that new keys in the dataframe are appended, but keys that already exist in the table are not?

Thanks.

lmart999
  • 6,671
  • 10
  • 29
  • 37
  • 3
    For what it's worth, if you find yourself doing this a lot, sqlalchemy is a great tool for interacting with sql from python. If you get set up with that, the session.merge(row) function will do exactly that, replacing a row with an existing primary key and adding it otherwise. – RussellStewart May 09 '14 at 23:11
  • Thanks. I will look into sqlalchemy. – lmart999 May 10 '14 at 08:23

2 Answers2

12

You can use SQL functionality insert or replace

query=''' insert or replace into NewTable (ID,Name,Age) values (?,?,?) '''
conn.executemany(query, test.to_records(index=False))
conn.commit()
Happy001
  • 6,103
  • 2
  • 23
  • 16
  • 1
    Thanks, this works. Minor addition: the index is excluded from the record array to match the table structure. `conn.executemany(query,test.to_records(index=False))` – lmart999 May 10 '14 at 08:19
  • 1
    That's right. I usually use primary key as index in df, so I forgot that. – Happy001 May 10 '14 at 13:38
  • 1
    Getting a binary data column instead of datetime when I use this. – Sid Oct 17 '19 at 18:01
  • Is there any way to push the entire pandas dataframe as a whole rather than needing to write out the individual values? It looks like with the query code that you need to explicitly write the individual values: ```insert or replace into NewTable (ID,Name,Age) values (?,?,?)``` – grantog Aug 23 '20 at 15:57
1

http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.io.sql.write_frame.html

curious if you tried replace?

if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
Machavity
  • 30,841
  • 27
  • 92
  • 100
sudhir
  • 271
  • 3
  • 4
  • 5
    `replace` in this case will not work as it will check if the table, not the record, already exists. If it does, then the entire table will be dropped and then recreated with the new dataframe. – Cedric Le Varlet Oct 10 '17 at 12:44