I am reading tables as SELECT * FROM TABLE (sql); from an ODBC data source via PyODBC and fetching/loading all the rows using Pandas read_sql(). However, there are 200+ tables and some have 100,000 rows so for that have been using chunksize to read and load to dataframes to gain some read performance.
Below is a sample code:
def get_odbc_tables(dsn,uid,pwd)
try:
cnxn = pyodbc.connect('DSN={};UID={};PWD={}'.format(dsn, uid, pwd), autocommit=True)
# Get data into pandas dataframe
dfl = []
df = pd.DataFrame()
for chunk in pd.read_sql(sql, cnxn, chunksize=10000):
dfl.append(chunk)
df = pd.concat(dfl, ignore_index=True)
records = json.loads(df.T.to_json()).values()
print("Load to Target")
......
cnxn.close()
except Exception as e:
print("Error: {}".format(str(e)))
sys.exit(1)
However, I am always getting this error after pandas has read/processed specified chunksize (10,000) as defined in the read_sql and loaded to target:
Error: The cursor's connection has been closed
If chunksize is increased to 50,000; it errors out again with same above error message once it has processed/loaded just 50,000 records, even though source has more records than this. This is also causing program failure.
C:\Program Files (x86)\Python\lib\site-packages\pandas\io\sql.py in _query_iterator(cursor, chunksize, columns, index_col, coerce_float, parse_dates)
1419 while True:
-> 1420 data = cursor.fetchmany(chunksize)
1421 if type(data) == tuple:
ProgrammingError: The cursor's connection has been closed.
During handling of the above exception, another exception occurred:
SystemExit Traceback (most recent call last)
<ipython-input-127-b106daee9737> in <module>()
Please suggest if there's any way to handle this. The source is an ODBC data source connection only, hence I think can't create an SQLAlchemy engine for an ODBC data sources.