I am reading from a Microsoft SQL server instance. I need to read all data from a table, which is quite big (~4 million records). So I like to do that in chunks, so I can limit the memory usage of my Python program.
This works fine normally, but now I need to move where this runs, which forces it go over a not super stable connection (I believe VPN is sometimes throttling the connection). So occasionally I get a connection error in one of the chunks:
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (
SQLGetData)')
The code I run comes down to this:
import pandas as pd
from sqlalchemy import create_engine
connection_string = 'mssql+pyodbc://DB_USER:DB_PASSWORD@DB_HOST/DB_NAME?trusted_connection=no&driver=ODBC+Driver+17+for+SQL+Server'
db = create_engine(connection_string, pool_pre_ping=True)
query = 'SELECT * FROM table'
for chunk in pd.read_sql_query(query, db, chunksize=500_000):
# do stuff with chunk
What I would like to know: is it possible to add a retry mechanism that can continue with the correct chunk if the connection fails? I've tried a few options, but none of them seem to be able to recover and continue at the same chunk.