1

I recently found out about sqlalchemy in Python. I'd like to use it for data science rather than website applications.
I've been reading about it and I like that you can translate the sql queries into Python.
The main thing that I am confused about that I'm doing is:

Since I'm reading data from an already well established schema, I wish I didn't have to create the corresponding models myself.
I am able to get around that reading the metadata for the table and then just querying the tables and columns. The problem is when I want to join to other tables, this metadata reading is taking too long each time, so I'm wondering if it makes sense to pickle cache it in an object, or if there's another built in method for that.

Edit: Include code. Noticed that the waiting time was due to an error in the loading function, rather than how to use the engine. Still leaving the code in case people comment something useful. Cheers.

The code I'm using is the following:

def reflect_engine(engine, update):
  store = f'cache/meta_{engine.logging_name}.pkl'

  if update or not os.path.isfile(store):
    meta = alq.MetaData()
    meta.reflect(bind=engine)
    with open(store, "wb") as opened:
      pkl.dump(meta, opened)
  else: 
    with open(store, "r") as opened:
      meta = pkl.load(opened)
  return meta


def begin_session(engine):
  session = alq.orm.sessionmaker(bind=engine)
  return session()

Then I use the metadata object to get my queries...

def get_some_cars(engine, metadata): 
  session = begin_session(engine)  

  Cars   = metadata.tables['Cars']
  Makes  = metadata.tables['CarManufacturers']

  cars_cols = [ getattr(Cars.c, each_one) for each_one in [
      'car_id',                   
      'car_selling_status',       
      'car_purchased_date', 
      'car_purchase_price_car']] + [
      Makes.c.car_manufacturer_name]

  statuses = {
      'selling'  : ['AVAILABLE','RESERVED'], 
      'physical' : ['ATOURLOCATION'] }

  inventory_conditions = alq.and_( 
      Cars.c.purchase_channel == "Inspection", 
      Cars.c.car_selling_status.in_( statuses['selling' ]),
      Cars.c.car_physical_status.in_(statuses['physical']),)

  the_query = ( session.query(*cars_cols).
      join(Makes, Cars.c.car_manufacturer_id == Makes.c.car_manufacturer_id).
      filter(inventory_conditions).
      statement )

  the_inventory = pd.read_sql(the_query, engine)
  return the_inventory
Diego-MX
  • 2,279
  • 2
  • 20
  • 35
  • 1
    Yes, The SQLAlchemy ORM is built upon the [‘core’](https://docs.sqlalchemy.org/en/latest/core/) and you can use that independent of the ORM. What you are looking for is [`MetaData.reflect()`](https://docs.sqlalchemy.org/en/latest/core/reflection.html#reflecting-all-tables-at-once). – SuperShoot Feb 11 '19 at 23:19
  • 1
    Docs say this about pickling metadata: `Regular pickle.loads()/dumps() can be used to fully dump any MetaData object, typically one which was reflected from an existing database at some previous point in time.` https://docs.sqlalchemy.org/en/latest/core/serializer.html – SuperShoot Feb 11 '19 at 23:31
  • Please include an example of how you're reflecting the tables currently. "The problem is when I want to join to other tables, this metadata reading is taking too long each time" sounds like you're doing a bit more work than necessary. – Ilja Everilä Feb 12 '19 at 05:47
  • @IljaEverilä sorry for the delay. Hope this throws light to people doing similar things. Thanks. – Diego-MX Mar 06 '19 at 16:07

0 Answers0