2

I am trying to insert pandas Data Frame into SQL using sqlalchemy. The table is already existed in Database with three columns ID, Brand and Price. ID is identity column . How I can I check before inserting each row from pandas data frame if the Brand is already existed or not.

    import pandas as pd

    cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': [22000,25000,27000,35000]
            }

    df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

    from sqlalchemy import create_engine, MetaData, Table, select
    from six.moves import urllib

    params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=server;DATABASE=mydb;UID=user;PWD=psw")
    engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params) 
    engine.connect() 

    # suppose df is the data-frame that we want to insert in database
    df.to_sql(name='mytable',con=engine, index=False, if_exists='append')

    print("inserted)
adnan
  • 504
  • 1
  • 4
  • 21

1 Answers1

3

You are really looking at a 30 year old relational database insert pattern. INSERT if it's not already in unique index (auto increment is not a meaningful unique key)

I've used mariadb but approach is same across all DBMS. Just stick to SQL92 standard.

  1. name your temp table
  2. name the real table
  3. define what defines unique key

table definition

create table car (
    id double not null AUTO_INCREMENT,
    brand varchar(20) not null,
    price double,
    primary key (id, brand),
    unique key (brand)
)

python/sqlachemy to insert if it does not exist

import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': [22000,25000,27000,35000]
            }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

engine = create_engine('mysql+pymysql://sniffer:sniffer@127.0.0.1/sniffer')

temptable = "temp"
table = "car"
key = ["Brand"]
df.to_sql(name=temptable,con=engine, index=False, if_exists='append')
conn = engine.connect()
transfersql = f"""insert into {table} ({",".join(df.columns)}) 
                 select * from {temptable} t 
                 where not exists 
                   (select 1 from {table} m 
                   where {"and".join([f" t.{col} = m.{col} " for col in key])}
                   )"""
print(transfersql)
conn.execute(transfersql)
conn.execute(f"drop table {temptable}")
conn.close()

output (generated sql)

insert into car (Brand,Price) 
                 select * from temp t 
                 where not exists 
                   (select 1 from car m 
                   where  t.Brand = m.Brand 
                   )
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30