2

i am having problems to load data into an access-database. For testing purpose i build a little convert functions which takes all data-sets from a hdf-file and writes it into the accdb. Without the @event.listens_for(engine, "before_cursor_execute") functionality it works, but veeery slow. With it, it creates an odd behavior. It creates only one empty table (from the first df) in the db and finishes execution. The for-loop will never be finished and no error raises.

Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany but couldn’t find any related information about it. Does any of you have some input for me how i can solve it or be able to write data in a faster way into the db?

big thanks!

import urllib
from pathlib import Path
from sqlalchemy import create_engine, event

# PATHS
HOME = Path(__file__).parent
DATA_DIR = HOME / 'output'

FILE_ACCESS = DATA_DIR / 'db.accdb'
FILE_HDF5 = DATA_DIR / 'Data.hdf'

# FUNCTIONS
def convert_from_hdf_to_accb():
    # https://github.com/gordthompson/sqlalchemy-access/wiki/Getting-Connected
    driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
    conn_str = 'DRIVER={};DBQ={};'.format(driver, FILE_ACCESS)
    conn_url = "access+pyodbc:///?odbc_connect={}".format(urllib.parse.quote_plus(conn_str))

    # https://medium.com/analytics-vidhya/speed-up-bulk-inserts-to-sql-db-using-pandas-and-python-61707ae41990
    # https://github.com/pandas-dev/pandas/issues/15276
    # https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc
    engine = create_engine(conn_url)

    @event.listens_for(engine, "before_cursor_execute")
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        if executemany:
            cursor.fast_executemany = True

    with pd.HDFStore(path=FILE_HDF5, mode="r") as store:
        for key in store.keys():
            df = store.get(key)
            df.to_sql(name=key, con=engine, index=False, if_exists='replace')

            print(' IT NEVER REACHES AND DOESNT RAISE AN ERROR :( ')

# EXECUTE
if __name__ == "__main__":
    convert_from_hdf_to_accb()
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
Jason
  • 35
  • 5
  • 1
    If your end goal is to efficiently import HDF files into Access, I'd do it directly from Access itself by using the HDF ODBC connector. If this is just a test, though, I don't really have an answer for you. – Erik A Jun 25 '20 at 11:40

1 Answers1

2

Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany

That is true. pyodbc's fast_executemany feature requires that the driver support an internal ODBC mechanism called "parameter arrays", and the Microsoft Access ODBC driver does not support them.

See also

https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418