I have several large pandas dataframes (about 30k+ rows) and need to upload a different version of them daily to a MS SQL Server db. I am trying to do so with the to_sql pandas function. On occasion, it will work. Other times, it will fail - silently - as if the code uploaded all of the data despite not having uploaded a single row.
Here is my code:
class SQLServerHandler(DataBaseHandler):
...
def _getSQLAlchemyEngine(self):
'''
Get an sqlalchemy engine
from the connection string
The fast_executemany fails silently:
https://stackoverflow.com/questions/48307008/pandas-to-sql-doesnt-insert-any-data-in-my-table/55406717
'''
# escape special characters as required by sqlalchemy
dbParams = urllib.parse.quote_plus(self.connectionString)
# create engine
engine = sqlalchemy.create_engine(
'mssql+pyodbc:///?odbc_connect={}'.format(dbParams))
return engine
@logExecutionTime('Time taken to upload dataframe:')
def uploadData(self, tableName, dataBaseSchema, dataFrame):
'''
Upload a pandas dataFrame
to a database table <tableName>
'''
engine = self._getSQLAlchemyEngine()
dataFrame.to_sql(
tableName,
con=engine,
index=False,
if_exists='append',
method='multi',
chunksize=50,
schema=dataBaseSchema)
Switching the method to None
seems to work properly but the data takes an insane amount of time to upload (30+ mins). Having multiple tables (20 or so) a day of this size discards this solution.
The proposed solution here to add the schema as a parameter doesn't work. Neither does creating a sqlalchemy session and passsing it to the con
parameter with session.get_bind()
.
I am using:
- ODBC Driver 17 for SQL Server
- pandas 1.2.1
- sqlalchemy 1.3.22
- pyodbc 4.0.30
Does anyone know how to make it raise an exception if it fails?
Or why it is not uploading any data?