2

I am trying to load a SQL table from CSV which contains blank values in some of the columns. I am using Python 3.7 and PyODBC, pandas to perform this operation.

Below is my DF -

   ID Name    City  Salary

   1  ABC    Pune   100.0

   2  DEF     NaN   200.0

   3  GHI  Mumbai   300.0

   4  JKL     NaN     NaN

   5  MNO   Delhi     NaN

and below is the code that I am using -

import pandas as pd
import pyodbc 

conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};Server=test;Port=test;database=test;UID=test;PWD=test')
cursor = conn.cursor()
df = pd.read_csv(r"E:\test\PyODBC NaN Handling\test.csv")

for row in df.itertuples():
    cursor.execute('''
INSERT INTO test.dbo.test ([ID],[Name],[City],[Salary])
VALUES (?,?,?,?)
''',
row.ID, row.Name,row.City,row.Salary)
conn.commit()

This gives me the below error -

pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]
The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 6 (""): 
The supplied value is not a valid instance of data type float. 
Check the source data for invalid values. 
An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')

Upon some research, I came to understand that PyODBC is not able to handle blank values and we need to use df.fillna() to replace blanks with some constant values (zero or space).

reference - "Insert Into" statement causing errors due to "Parameter 7 (""): The supplied value is not a valid instance of data type float."

Can someone please help with any alternative solution to this problem that would insert Null values in the table without explicitly converting them to any constant value?

Thanks,

Sanket Kelkar

Umar.H
  • 22,559
  • 7
  • 39
  • 74
Sanket Kelkar
  • 129
  • 2
  • 9
  • 2
    @JustinEzequiel - Thanks for the guidance! Got it working using df = df.fillna(np.nan).replace([np.nan], [None]) – Sanket Kelkar Mar 11 '21 at 10:17
  • Not PyOdbc specific, but if your problem is "just" CSV -> ODBC you may want to give odbcsv a spin: https://github.com/pacman82/odbc-api/tree/master/odbcsv – Markus Klein Mar 15 '21 at 12:24

2 Answers2

3

pandas' to_sql() method automatically converts NaN values to SQL NULL (which are represented in Python as None):

import io

import pandas as pd
import sqlalchemy as sa

engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH@mssql_199")

csv = io.StringIO(
    """\
Name,City,Salary
ABC,Pune,100.0
MNO,Delhi,
"""
)
df = pd.read_csv(csv)
print(df)
"""
  Name   City  Salary
0  ABC   Pune   100.0
1  MNO  Delhi     NaN
"""
df.to_sql("so66579476", engine, index=False, if_exists="replace")
with engine.connect() as conn:
    print(conn.exec_driver_sql("SELECT * FROM so66579476").all())
    # [('ABC', 'Pune', 100.0), ('MNO', 'Delhi', None)]
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
1

Try to change NaN to None like this way: df2 = df.astype(object).where(pd.notnull(df), None)

See Python Pandas write to sql with NaN values