0

I am using sqlalchemy write to Postgres.

import sqlalchemy
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres@localhost:5432/mydb')

I want to append a pandas dataframe, df, to table in mydb.

Both have the same column names, key and value.

key is the primary key in table.

Some key values in df are already in table.

(1) Append fails:

df.to_sql('table',engine,if_exists='append')

This fails (IntegrityError) because some keys in df are already in table.

(2) I convert df to a record array and try writing to table:

conn = engine.connect()
query=''' insert or replace into table (key,value) values (?,?) '''
conn.executemany(query,df.to_records())

This works with sqlite (Appending Pandas dataframe to sqlite table by primary key).

But, it fails here (ProgrammingError: syntax error at or near "or").

What's the right way to do this using postgres and sqlalchemy?

Community
  • 1
  • 1
lmart999
  • 6,671
  • 10
  • 29
  • 37
  • Won't the manual `executemany` with an `insert into` query also fail? (as in that case you will try inserting keys already in the table?) – joris Jan 16 '15 at 14:01
  • Yes, you may be right: with `sqlite3` library, `insert or replace` in the query solves this problem with `conn.executemany()`. If I include `or replace` in the query here, I get an error: `ProgrammingError: (ProgrammingError) syntax error at or near "or". ` – lmart999 Jan 16 '15 at 16:31
  • As far as I know, postgresql has no such feature to do an insert with update on duplicate values, see eg http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-in-postgresql – joris Jan 16 '15 at 18:05
  • Got it. What about performing the `insert` for each record in the array and catching the error when the key is present? That would solve the problem. – lmart999 Jan 16 '15 at 19:26
  • What you also can do is first read the keys from sql, then filter your dataframe to those rows that don't include any of these keys, and then write that filtered frame to sql. – joris Jan 17 '15 at 10:11
  • Right. That's a good solution. – lmart999 Jan 17 '15 at 19:32

1 Answers1

0

Iterating through the records and catching exceptions seems to be the best solution?

inp=df.to_records(index=True)
for i in inp: 
   q=''' insert into table (key,value) values %s '''%i
   try:
     conn.execute(q)
   except IntegrityError:
     print "Key in table."
lmart999
  • 6,671
  • 10
  • 29
  • 37