I'm using latest versions of python, pandas library, pyodbc driver, MS SQL server management studio and sql alchemy.
I have the following code:
connection_string = f'DRIVER={driver};SERVER={server};DATABASE={database};ENCRYPT=yes;UID={username};PWD={password}'
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
df = pd.read_csv('some-file.csv',
sep = '|',
skiprows = [1],
on_bad_lines = 'warn',
encoding = 'latin1'
)
The CSV file has 179,098 rows, with 28 columns. Then I have the following:
df.to_sql('some-file-column',
con = engine,
if_exists = 'append',
index = False,
method = 'multi',
chunksize = 1000)
When executing this, I get:
(pyodbc.Error) ('07002', '[07002] [Microsoft][ODBC Driver 17 for SQL Server]COUNT field incorrect or syntax error (0) (SQLExecDirectW)')
If I delete "chunksize" (i.e, allowing all records to be loaded at the same time) from the parameters, I get the following error:
(pyodbc.ProgrammingError) ('The SQL contains -31528 parameter markers, but 5014744 parameters were supplied', 'HY000')
I get the same error but with different "parameter markers" and "parameters supplied" depending on if I add "chunksize" back and the number I use for the chunks.
More importantly, when not using "method = 'multi'", the database loads perfectly fine, to the last record. Granted, it does so but very slowly, which makes it unusable. Also, it loads fine when I limit the dataframe rows from the CSV using "nrows = 100", for example, but beyond that, it throws an error.
As an alternative, I have tried using just pyodbc:
cursor.fast_executemany = True
cursor.executemany("insertion_string_goes_here", df.values.tolist())
which works perfectly fine.
Help? I haven't found any answer to this issue as to why it happens. I need some of the functionality of sql alchemy (executing .sql scripts with python, which pyodbc doesn't allow AFAIK). That's why I don't simply just stay with pyodbc.