I have scraped some baseball data from the web using Python and have saved it in a Pandas dataframe. One of the columns contains the numeric player identifying code, but it often null. Because Pandas integer columns cannot contain NaNs, this column has been converted to float and saved as such when I export to the data to csv.
This causes problems when I try and run a bulk insert into the MS SQL Server database as the relevant column is defined to be of type 'INT' (and also contains a foreign key constraint to another table). The following error message is raised:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage)
I tried the suggestion provided here, but it seems changing the format of the float does not solve the problem and SQL Server still throws the same error.
I am aware that packages such as pyodbc provide an alternate way of transferring data to a database, but my dataframe is quite large (approx 6M rows, 70 cols) and I have found the process too slow, at least compared to saving to csv and importing.
How can I best solve this problem? Ultimately, I'm not wedded to saving to csv and using bulk insert in SQL Server, but I do want a method that is similarly fast.