I recently found out about sqlalchemy
in Python. I'd like to use it for data science rather than website applications.
I've been reading about it and I like that you can translate the sql queries into Python.
The main thing that I am confused about that I'm doing is:
Since I'm reading data from an already well established schema, I wish I didn't have to create the corresponding models myself.
I am able to get around that reading the metadata for the table and then just querying the tables and columns.
The problem is when I want to join to other tables, this metadata reading is taking too long each time, so I'm wondering if it makes sense to pickle cache it in an object, or if there's another built in method for that.
Edit: Include code. Noticed that the waiting time was due to an error in the loading function, rather than how to use the engine. Still leaving the code in case people comment something useful. Cheers.
The code I'm using is the following:
def reflect_engine(engine, update):
store = f'cache/meta_{engine.logging_name}.pkl'
if update or not os.path.isfile(store):
meta = alq.MetaData()
meta.reflect(bind=engine)
with open(store, "wb") as opened:
pkl.dump(meta, opened)
else:
with open(store, "r") as opened:
meta = pkl.load(opened)
return meta
def begin_session(engine):
session = alq.orm.sessionmaker(bind=engine)
return session()
Then I use the metadata object to get my queries...
def get_some_cars(engine, metadata):
session = begin_session(engine)
Cars = metadata.tables['Cars']
Makes = metadata.tables['CarManufacturers']
cars_cols = [ getattr(Cars.c, each_one) for each_one in [
'car_id',
'car_selling_status',
'car_purchased_date',
'car_purchase_price_car']] + [
Makes.c.car_manufacturer_name]
statuses = {
'selling' : ['AVAILABLE','RESERVED'],
'physical' : ['ATOURLOCATION'] }
inventory_conditions = alq.and_(
Cars.c.purchase_channel == "Inspection",
Cars.c.car_selling_status.in_( statuses['selling' ]),
Cars.c.car_physical_status.in_(statuses['physical']),)
the_query = ( session.query(*cars_cols).
join(Makes, Cars.c.car_manufacturer_id == Makes.c.car_manufacturer_id).
filter(inventory_conditions).
statement )
the_inventory = pd.read_sql(the_query, engine)
return the_inventory