0

I am extracting a table row and the corresponding row from all referenced tables via SQLAlchemy.

Given the following object structure:

class DNAExtractionProtocol(Base):
    __tablename__ = 'dna_extraction_protocols'
    id = Column(Integer, primary_key=True)
    code = Column(String, unique=True)
    name = Column(String)
    sample_mass = Column(Float)
    mass_unit_id = Column(String, ForeignKey('measurement_units.id'))
    mass_unit = relationship("MeasurementUnit", foreign_keys=[mass_unit_id])
    digestion_buffer_id = Column(String, ForeignKey("solutions.id"))
    digestion_buffer = relationship("Solution", foreign_keys=[digestion_buffer_id])
    digestion_buffer_volume = Column(Float)
    digestion_id = Column(Integer, ForeignKey("incubations.id"))
    digestion = relationship("Incubation", foreign_keys=[digestion_id])
    lysis_buffer_id = Column(String, ForeignKey("solutions.id"))
    lysis_buffer = relationship("Solution", foreign_keys=[lysis_buffer_id])
    lysis_buffer_volume = Column(Float)
    lysis_id = Column(Integer, ForeignKey("incubations.id"))
    lysis = relationship("Incubation", foreign_keys=[lysis_id])
    proteinase_id = Column(String, ForeignKey("solutions.id"))
    proteinase = relationship("Solution", foreign_keys=[proteinase_id])
    proteinase_volume = Column(Float)
    inactivation_id = Column(Integer, ForeignKey("incubations.id"))
    inactivation = relationship("Incubation", foreign_keys=[inactivation_id])
    cooling_id = Column(Integer, ForeignKey("incubations.id"))
    cooling = relationship("Incubation", foreign_keys=[cooling_id])
    centrifugation_id = Column(Integer, ForeignKey("incubations.id"))
    centrifugation = relationship("Incubation", foreign_keys=[centrifugation_id])

    volume_unit_id = Column(String, ForeignKey('measurement_units.id'))
    volume_unit = relationship("MeasurementUnit", foreign_keys=[volume_unit_id])

I am using:

sql_query = session.query(DNAExtractionProtocol).options(Load(DNAExtractionProtocol).joinedload("*")).filter(DNAExtractionProtocol.code == code)
for item in sql_query:
    pass
mystring = str(sql_query)
mydf = pd.read_sql_query(mystring,engine,params=[code])
print(mydf.columns)

This gives me:

Index([u'dna_extraction_protocols_id', u'dna_extraction_protocols_code',
       u'dna_extraction_protocols_name',
       u'dna_extraction_protocols_sample_mass',
       u'dna_extraction_protocols_mass_unit_id',
       u'dna_extraction_protocols_digestion_buffer_id',
       u'dna_extraction_protocols_digestion_buffer_volume',
       u'dna_extraction_protocols_digestion_id',
       u'dna_extraction_protocols_lysis_buffer_id',
       u'dna_extraction_protocols_lysis_buffer_volume',
       u'dna_extraction_protocols_lysis_id',
       u'dna_extraction_protocols_proteinase_id',
       u'dna_extraction_protocols_proteinase_volume',
       u'dna_extraction_protocols_inactivation_id',
       u'dna_extraction_protocols_cooling_id',
       u'dna_extraction_protocols_centrifugation_id',
       u'dna_extraction_protocols_volume_unit_id', u'measurement_units_1_id',
       u'measurement_units_1_code', u'measurement_units_1_long_name',
       u'measurement_units_1_siunitx', u'solutions_1_id', u'solutions_1_code',
       u'solutions_1_name', u'solutions_1_supplier',
       u'solutions_1_supplier_id', u'incubations_1_id', u'incubations_1_speed',
       u'incubations_1_duration', u'incubations_1_temperature',
       u'incubations_1_movement', u'incubations_1_speed_unit_id',
       u'incubations_1_duration_unit_id', u'incubations_1_temperature_unit_id',
       u'solutions_2_id', u'solutions_2_code', u'solutions_2_name',
       u'solutions_2_supplier', u'solutions_2_supplier_id',
       u'incubations_2_id', u'incubations_2_speed', u'incubations_2_duration',
       u'incubations_2_temperature', u'incubations_2_movement',
       u'incubations_2_speed_unit_id', u'incubations_2_duration_unit_id',
       u'incubations_2_temperature_unit_id', u'solutions_3_id',
       u'solutions_3_code', u'solutions_3_name', u'solutions_3_supplier',
       u'solutions_3_supplier_id', u'incubations_3_id', u'incubations_3_speed',
       u'incubations_3_duration', u'incubations_3_temperature',
       u'incubations_3_movement', u'incubations_3_speed_unit_id',
       u'incubations_3_duration_unit_id', u'incubations_3_temperature_unit_id',
       u'incubations_4_id', u'incubations_4_speed', u'incubations_4_duration',
       u'incubations_4_temperature', u'incubations_4_movement',
       u'incubations_4_speed_unit_id', u'incubations_4_duration_unit_id',
       u'incubations_4_temperature_unit_id', u'incubations_5_id',
       u'incubations_5_speed', u'incubations_5_duration',
       u'incubations_5_temperature', u'incubations_5_movement',
       u'incubations_5_speed_unit_id', u'incubations_5_duration_unit_id',
       u'incubations_5_temperature_unit_id', u'measurement_units_2_id',
       u'measurement_units_2_code', u'measurement_units_2_long_name',
       u'measurement_units_2_siunitx', u'dna_extractions_1_id',
       u'dna_extractions_1_code', u'dna_extractions_1_protocol_id',
       u'dna_extractions_1_source_id'],
      dtype='object')

This indeed contains all the columns I want - but the naming does not help me select what I want.

Isit possible to preserve the key names from the original table in this dataframe? e.g. instead of measurement_units_1_code I would like to have mass_unit_code.

TheChymera
  • 17,004
  • 14
  • 56
  • 86
  • couldn't you rename the columns to suit your need once loaded into pandas? – Haleemur Ali Mar 22 '16 at 13:31
  • That would just offset the issue, I would need to write an explicit mapping (needless to say I am not doing this just for one table), and make sure that `measurement_units_1_code` **always** corresponds to `mass_unit_code` and not to `volume_unit_code`. I am unsure how the number in `measurement_units_1_code` gets assigned, but it would be so much better if I could get the actual key name instead os some number... – TheChymera Mar 22 '16 at 14:27

1 Answers1

2

This is not what joinedload is supposed to be used for. You want to do an explicit join in this case:

session.query(DNAExtractionProtocol.id.label("id"),
              ...,
              MeasurementUnit.id.label("mass_unit_id"),
              ...) \
       .join(DNAExtractionProtocol.mass_unit) \
       .join(DNAExtractionProtocol.digestion_buffer) \
       ... \
       .filter(...)

If you don't want to type out all those names, you can inspect the DNAExtractionProtocol class to find all relationships and dynamically construct the query and labels. An example:

cols = []
joins = []
insp = inspect(DNAExtractionProtocol)
for name, col in insp.columns.items():
    cols.append(col.label(name))
for name, rel in insp.relationships.items():
    alias = aliased(rel.mapper.class_, name=name)
    for col_name, col in inspect(rel.mapper).columns.items():
        aliased_col = getattr(alias, col.key)
        cols.append(aliased_col.label("{}_{}".format(name, col_name)))
    joins.append((alias, rel.class_attribute))

query = session.query(*cols).select_from(DNAExtractionProtocol)
for join in joins:
    query = query.join(*join)

EDIT: Depending on your data structure you might need to use outerjoin instead of join on the last line.

You'll probably need to tweak this to your liking. For example, this doesn't take into account potential naming conflicts, e.g. for mass_unit_id, is it DNAExtractionProtocol.mass_unit_id or is it MeasurementUnit.id?

In addition, you'll probably want to execute sql_query.statement instead of str(sql_query). str(sql_query) is for printing purposes, not for execution. I believe you don't need to pass params=[code] if you use sql_query.statement because code will already have been bound to the appropriate parameter in the query.

TheChymera
  • 17,004
  • 14
  • 56
  • 86
univerio
  • 19,548
  • 3
  • 66
  • 68
  • Thank you! How can I inspect my class for all `relationship()` keys? – TheChymera Mar 22 '16 at 18:41
  • @TheChymera I've added an example for that. – univerio Mar 22 '16 at 19:23
  • Also, the code above seems to break down if my table uses joined table inheritance (as seen [here](http://stackoverflow.com/questions/33869328/multiple-split-class-associations-in-sqlalchemy)). It seems to be looking in the parent class for keys that are actually provided by the class in question, and crashes with errors like `Tried joining to , but got: Can't find any foreign key relationships between 'protocols' and 'measurement_units'.` – TheChymera Mar 23 '16 at 11:38
  • @TheChymera The relationship inspection code? You need to add some special logic for that. Depending on which tables you need to query, most likely you also need to join the base table. Can you post the entirety of your models, including the ones using table inheritance? – univerio Mar 23 '16 at 17:04
  • [This](https://github.com/TheChymera/medaba/blob/4ce5770212e52d276cc486204ec173fb8c2e9111/src/dbdata/common_classes.py) is my full current database model. – TheChymera Mar 24 '16 at 00:47
  • @TheChymera I've fixed the example. In this case, you're not doing anything special with inheritance; there was simply a bug in the original code. – univerio Mar 24 '16 at 01:55
  • This now gives me `sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) ambiguous column name: measurement_units.id` how is that ambiguous? And how can I fix this one example? (I'm sure there will be more, but fixable along the same lines) – TheChymera Mar 25 '16 at 17:45
  • @TheChymera That's because you're attempting to join twice to the same table -- once through `mass_unit` and once through `volume_unit`. – univerio Mar 25 '16 at 18:52
  • @TheChymera I've updated the example to accommodate that by adding aliases. – univerio Mar 25 '16 at 19:03
  • I implemented the newest changes [here](https://github.com/TheChymera/medaba/blob/d2b22ee86c17f8199b0209e879c2317751089bf6/src/dbdata/fmri.py#L49-L67) and the dataframe extracted from the query now contains all of the columns I would be looking for. Sadly for some reason the dataframe is now blank. The SQL query for the filter seems to be `WHERE protocols.code = :code_1`. I am unsure whether this is correct - shouldn't the filter be the value I store in the `code` variable (which in the current example is `"EPDqEP"`)?. – TheChymera Mar 27 '16 at 22:05
  • Actually, it seems the dataframe is returned empty independently of whether or not I apply a filter. – TheChymera Mar 27 '16 at 22:17
  • 1
    @TheChymera You should consider the generated SQL. I don't know what your data looks like. Perhaps you should do an `OUTER JOIN` instead of an `INNER JOIN`? – univerio Mar 28 '16 at 03:30