13

I'm loading a batch of CSV files into a SQL Server table using Python one row at a time. The files each contain a number of free text fields and erroneous data which I trim and rename before attempting to insert.

In general (about 95% of the time), the code seems to work however exceptions appear with the error message described below.

I'm confused as a) I only have four columns in my table, and can't understand why it would be looking for Parameter 7, and b) the text columns are being loaded into nvarchar(max) formatted columns, so I wouldn't expect a data type error.

I've checked the source files to see which rows threw an error, there seems to be no discernible difference between the problem rows and others that are successfully loaded.

I've trimmed the process right back to only insert the JobID (as a bigint) and it works without issue, but as soon as I bring in the text fields, it causes an error.

I'm using Python 3.7.0 and loading into SQL Server 14.0

import numpy as np
import pyodbc
import os
import glob
import pandas as pd
import csv
import config
import urllib
import shutil
import codecs

path = "C:\\myFilePath"

allFiles = glob.glob(os.path.join(path, "*.csv"))

for file_ in allFiles:
    df = pd.concat((pd.read_csv(f, encoding='utf8') for f in allFiles))

cnxn = pyodbc.connect("Driver={ODBC Driver 13 for SQL Server};"                                             
                  "Server=myServer;"
                  "Database=myDatabase;"
                  "Trusted_Connection=yes;"
                  "SelectMethod=cursor;"
                  )

df2 = df[['JobID', 'NPS_score', 'Obtuse_Column_Name_1', 'Obtuse_Column_Name_2']].copy()

df2.columns = ['JobID', 'Score','Q1', 'Q2']

cursor = cnxn.cursor()
for index,row in df2.iterrows():
    try:
        counter = counter + 1
        cursor.execute("""insert into [myDB].[dbo].[input_test]( [JobID], [Score], [Q1], [Q2]) VALUES (?, ?, ?, ?)""", row['JobID'],row['Score'],row['Q1'], row['Q2'])
        cursor.commit()
        print(counter)
    except Exception as e:
        print(e) 
        continue    
cursor.close()  
cnxn.close()

I expect the data to be loaded but on some lines get the following error code:

('42000', '[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 7 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
Ron Sibayan
  • 359
  • 1
  • 2
  • 10
  • I would first try to debug by printing the insert statement. It seems that the Q2 value is empty string and is illegal given the table definition. BTW why not use pandas to sql? https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html – Tom Ron Dec 31 '18 at 11:18
  • Thanks @TomRon, I did check to see if Q2 was blank but it wasn't. Also, the table should allow for nulls in those columns (as many people just give a score without question text). I also did try to_sql first but ran into a problem when it went over 1000 rows. I thought going line by line would be a suitable alternative – Ron Sibayan Dec 31 '18 at 11:29
  • 1
    Hi @Ron Sibayan and welcome! Since the problem only occurs for some rows in your data, I think it might help if you could identify that row and rewrite your example to include it (or an obfuscated example that shows the same behaviour). It is often difficult to get help on non-reproducible questions. – steinar Dec 31 '18 at 11:29
  • Thank you @steinar, I'll try and pull out a few examples right now – Ron Sibayan Dec 31 '18 at 11:30
  • what are the type of columns in the database? – Salman A Dec 31 '18 at 12:05
  • @SalmanA I have 1 bigint for the JobID and 3 nvarchar(max) – Ron Sibayan Dec 31 '18 at 12:06
  • struggling to write a snippet that replicates this error without submitted the csv (which I can't do for commercial reasons). I'll go back to to_sql and see if I can make that work – Ron Sibayan Dec 31 '18 at 12:33
  • @RonSibayan helo, please provide the full structure of Table (field and type) and the sample off thoses line your're trying to inserting on. – Sanpas Dec 31 '18 at 13:04

3 Answers3

17

I have come across similar kind of error while dumping the data to SQL table

I exported the data to .csv file and found out there are inf values in two columns.

So, I replaced inf values with nan and it worked :)

df2.replace({np.inf: np.nan, -np.inf: np.nan}, inplace=True)

and fillna(0) for replacing nan with 0

df2 = df2.fillna(0)

PS: This answer might help someone else too.

Suhas_Pote
  • 3,620
  • 1
  • 23
  • 38
  • 1
    This replace syntax did not work for me. I had to use df2.replace({np.inf: np.nan, -np.inf: np.nan}, inplace=True) – matt123788 Jul 01 '21 at 17:38
12

I have found and solved the issue. Following @TomRon's advise, I printed the insert statement into the error block. This showed that despite me assuming Q2 was populated (as it was in the CSV) when I loaded it into a dataframe, for some reason it was NaN.

I simply included the function to replace NaN with 0 and it now loads all rows succesfully using the following line of code right after I create the dataframe I want to load (df2)

df2 = df2.fillna(value=0)
Ron Sibayan
  • 359
  • 1
  • 2
  • 10
0

You should replace nan to None by below code

df = df.where(pd.notnull(df), None)
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 04 '22 at 18:20