1

I encountered pyodbc dataError -

sqlalchemy.exc.DataError: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification (0) (SQLExecute)')

while trying to insert into db table with dataframe.to_sql() after reading a csv file (using Pandas). ALL empty values in my dataframe are None.

BTW, I encountered this error while processing one file. The program executed successfully for two other files (containing approx. 500K rows)... Also, the ones where the data was successfully inserted into the db table had NULL in place for the None types in the dataframe.

fast_executemany = False should work, and it does work as mentioned here. I have tried it and I don't encounter the error. But, the problem is that it is painfully slow for the files I am dealing with (which contain at least 200K rows).

Environment: Windows 10 Python 3.9.6 pyodbc 4.0.32

Is there a different way?

Code below:

import sqlalchemy as sq, pyodbc as po, os, pandas as pd,

conn = sq.create_engine(f'mssql+pyodbc://dbname/tablename?trusted_connection=yes&driver=ODBC Driver 17 for SQL Server',fast_executemany = True)
#print(conn)

os.chdir(r"some path")

col_types = {
    'col 1':sq.types.INTEGER(),
    'col 2':sq.types.VARCHAR(length=100),
    'col 3':sq.types.INTEGER(),
    'col 4':sq.types.VARCHAR(length=100),
    'col 5':sq.types.DateTime(),
    'col 6':sq.types.VARCHAR(length=5),
    'col 7':sq.types.DateTime(),
    'col 8':sq.types.VARCHAR(length=5),
    'col 9':sq.types.DateTime(),
    'col 10':sq.types.VARCHAR(length=5),
    'col 11':sq.types.DateTime(),
    'col 12':sq.types.VARCHAR(length=5),
    'col 13':sq.types.Float(),
    'col 14':sq.types.Float(),
    'col 15':sq.types.Float(),
    'col 16':sq.types.INTEGER(),
    'col 17':sq.types.VARCHAR(length=1000),
    'col 18':sq.types.VARCHAR(length=100),
    'col 19':sq.types.VARCHAR(length=1000),
    'col 20':sq.types.DateTime(),
    'col 21':sq.types.VARCHAR(length=5),
    'col 22':sq.types.DateTime(),
    'col 23':sq.types.VARCHAR(length=5),
    'col 24':sq.types.VARCHAR(length=50),
    'col 25':sq.types.VARCHAR(length=50),
    'col 26':sq.types.Float(),
    'col 27':sq.types.Float(),
    'col 28':sq.types.Float(),
    'col 29':sq.types.VARCHAR(length=150),
    'col 30':sq.types.VARCHAR(length=1000),
    'col 31':sq.types.VARCHAR(length=1000),
    'col 32':sq.types.VARCHAR(length=100),
    'col 33':sq.types.VARCHAR(length=100),
    'col 34':sq.types.INTEGER(),
    'col 35':sq.types.VARCHAR(length=100),
    'col 36':sq.types.Float(),
    'col 37':sq.types.Float(),
    'col 38':sq.types.VARCHAR(length=10),
    'col 39':sq.types.Float(),
    'col 40':sq.types.VARCHAR(length=1000),
    'col 41':sq.types.VARCHAR(length=20)
};

for f in os.listdir():
    if f.endswith(".txt"):
        df = pd.read_csv(f, sep='\t', low_memory=False)
        df.to_sql(tablename, con = conn, if_exists = 'append', index=False, dtype=col_types)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • I strongly suspect that you actually do have one or more empty strings in the DataFrame and you don't realize it. – Gord Thompson Feb 01 '22 at 18:18
  • I replaced all NaN values with None and checked to see if it would work. it didn't :( – Sharath Kallaje Feb 01 '22 at 19:57
  • `NaN` values are not the problem; empty strings are. – Gord Thompson Feb 01 '22 at 21:36
  • I read the first 10 rows of the file into the dataframe. Tried loading it into the table. It failed with the error in question. I checked if there were any empty strings in those 10 rows. I couldn't find even one. res = df[df['col_name'] == ''].index //returns empty array of indices And yes, I tried it for all datetime columns – Sharath Kallaje Feb 01 '22 at 22:53
  • With`fast_executemany=True` empty strings can be a problem for float columns, too. – Gord Thompson Feb 01 '22 at 23:11
  • … and integer columns as well. – Gord Thompson Feb 01 '22 at 23:26
  • 5
    Finally, a breakthrough! yesterday night, one of the many potential solutions I had made a list of was the one below. **convert all datetime strings in the dateframe to datetime objects** **df['column_name'] = pd.to_datetime(df['column_name'])** works!! I remembered reading somewhere (maybe on github) where you argued that converting these datetime strings into python 'datetime objects' would actually prevent this error from occurring in the case of fast_executemany=True. I believe you had made your case by demonstrating datetime.isoformat() in python. That helped :) Thanks :) – Sharath Kallaje Feb 01 '22 at 23:51

0 Answers0