INSERT INTO SELECT based on a dataframe

Question

I have a dataframe df and I want to to execute a query to insert into a table all the values from the dataframe. Basically I am trying to load as the following query:

INSERT INTO mytable
SELECT *
FROM mydataframe

For that I have the following code:

import pyodbc
import pandas as pd

connection = pyodbc.connect('Driver={' + driver + '} ;'
                            'Server=' + server + ';'
                            'UID=' + user + ';'
                            'PWD=' + pass + ';')

cursor = connection.cursor()

query = 'SELECT * FROM [myDB].[dbo].[myTable]'
df = pd.read_sql_query(query, connection)

sql = 'INSERT INTO [dbo].[new_date] SELECT * FROM :x'
cursor.execute(sql, x=df)
connection.commit()

However, I am getting the following error:

TypeError: execute() takes no keyword arguments

Does anyone know what I am doing wrong?

Why are you not using the counterpart of `read_sql` with [**`to_sql`**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)? Do not note the latter requires an SQLAlchemy connection (not raw DB-API one like `pyodbc`). — Parfait, Apr 19 '20 at 16:24
I've tried but I found a lot of issues like 'Data source name not found and no default driver specified (0) (SQLDriverConnect)')' — Pedro Alves, Apr 19 '20 at 16:36
See SQLAlchemy docs of MS SQL Server for setup of connection: https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc — Parfait, Apr 19 '20 at 16:42

Michel Guimarães · Answer 1 · 2020-04-19T17:01:53.197

2

I had some issues to connect pandas with SQL Server too. But I've get this solution to write my df:

import pyodbc
import sqlalchemy

engine = sqlalchemy.create_engine('mssql+pyodbc://{0}:{1}@{2}:{3}/{4}?driver={5}'.format(username,password,server,port,bdName,driver))
pd.to_sql("TableName",con=engine,if_exists="append")

edited Apr 19 '20 at 17:01

answered Apr 19 '20 at 16:22

Michel Guimarães

413
3
10

I really think that's better use pandas `to_sql()` than SQL `INSERT`statement to do this! – Michel Guimarães Apr 19 '20 at 16:24
I've tried but I found a lot of issues like 'Data source name not found and no default driver specified (0) (SQLDriverConnect)')'. I am putting for the driver name "SQL Server" – Pedro Alves Apr 19 '20 at 16:38
We cannot see your *driver* so you must adjust accordingly. See [docs](https://docs.sqlalchemy.org/en/13/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc) where spaces must be replaced by `+`. Maybe: `...driver=ODBC+{5}.format(..., driver.replace(' ', '+')`. Test with hard-coded values before dynamic variables. – Parfait Apr 19 '20 at 16:47
Yes, @Parfait has the point. But I will edit my question to be a generic driver – Michel Guimarães Apr 19 '20 at 17:01

score 2 · Answer 2 · answered Apr 19 '20 at 16:43

For raw DB-API insert query from Pandas, consider DataFrame.to_numpy() with executemany and avoid any top layer for looping. However, explicit columns must be used in append query. Adjust below columns and qmark parameter placeholders to correspond to data frame columns.

# PREPARED STATEMENT
sql = '''INSERT INTO [dbo].[new_date] (Col1, Col2, Col3, ...)
         VALUES (?, ?, ?, ...)
      '''

#  EXECUTE PARAMETERIZED QUERY
cursor.executemany(sql, df.to_numpy().tolist())   
conn.commit()

(And by the way, it is best practice generally in SQL queries to always explicitly reference columns and avoid SELECT * for code readability, maintainability, and even performance.)

Rajat Mishra · Answer 3 · 2020-04-19T16:36:51.980

cursor.execute doesnot accepts keyword arguments. One way of doing the insert can be using following below code snippet.

cols = "`,`".join([str(i) for i in df.columns.tolist()])

# Insert DataFrame recrds one by one.
for i,row in df.iterrows():
    sql = "INSERT INTO `[dbo].[new_date]` (`" +cols + "`) VALUES (" + "?,"*(len(row)-1) + "%s)"
    cursor.execute(sql, tuple(row))

here you are iterating through each row and then inserting it into the table.

score 1 · Accepted Answer · edited Dec 23 '20 at 12:14

See below my favourite solution, with UPSERT statement included.

df_columns = list(df)
columns = ','.join(df_columns)
values = 'VALUES({})'.format(','.join(['%s' for col in df_columns]))
update_list = ['{} = EXCLUDED.{}'.format(col, col) for col in df_columns]
update_str = ','.join(update_list)
insert_stmt = "INSERT INTO {} ({}) {} ON CONFLICT ([your_pkey_here]) DO UPDATE SET {}".format(table, columns, values, update_str)

score 0 · Answer 5 · answered Apr 19 '20 at 18:09

thank you for your answers :) but I use the following code to solve my problem:

params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=servername;DATABASE=database;UID=user;PWD=pass")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
query = query
df = pd.read_sql_query(query, connection)
df.to_sql(name='new_table',con=engine, index=False, if_exists='append')

INSERT INTO SELECT based on a dataframe

5 Answers5