SQLAlchemy not building JOIN on select correctly

Question

I'm trying to translate the following query into a SQLAlchemy ORM query:

SELECT applications.*, 
       appversions.version 
FROM   applications 
       JOIN appversions 
         ON appversions.id = (SELECT id 
                              FROM   appversions 
                              WHERE  appversions.app_id = applications.id 
                              ORDER  BY sort_ver DESC 
                              LIMIT  1)

The model for the tables are as follows:

Base = declarative_base()

class Application(Base):
    __tablename__ = 'applications'

    id       = Column(Integer, primary_key = True)
    group    = Column(Unicode(128))
    artifact = Column(Unicode(128))

    versions = relationship("AppVersion", backref = "application")

class AppVersion(Base):
    __tablename__ = 'versions'

    id        = Column(Integer, primary_key = True)
    app_id    = Column(Integer, ForeignKey('applications.id'))
    version   = Column(Unicode(64))
    sort_ver  = Column(Unicode(64))

And the query I've so far come up with is:

subquery = select([AppVersion.id]). \
                where(AppVersion.app_id == Application.id). \
                order_by(AppVersion.sort_ver). \
                limit(1). \
                alias()


query = session.query(Application). \
                join(AppVersion, AppVersion.id == subquery.c.id) \
                .all()

However, this is producing the following SQL statement and error:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: anon_1.id
[SQL: SELECT applications.id AS applications_id, applications."group" AS applications_group, applications.artifact AS applications_artifact
FROM applications JOIN versions ON versions.id = anon_1.id]

I have tried various different methods to produce the subquery and attempting to 'tack on' the sub-SELECT command, but without any positive impact.

Is there a way to coerce the SQLAlchemy query builder to correctly append the sub-SELECT?

Check https://stackoverflow.com/a/6325582/2681632. Basically compare against the whole subquery, not a column of it. — Ilja Everilä, Dec 17 '19 at 19:04

score 1 · Accepted Answer · answered Dec 19 '19 at 11:13

With thanks to @Ilja Everilä for the nudge in the right direction, the code to generate the correct query is:

subquery = select([AppVersion.id]). \
                where(AppVersion.app_id == Application.id). \
                order_by(AppVersion.sort_ver). \
                limit(1). \
                correlate(Application)

query = session.query(Application). \
                join(AppVersion, AppVersion.id == subquery) \
                .all()

The main change is to use the correlate() method, which alters how SQLAlchemy constructs the subquery.

To explain why this works requires some understanding of how SQL subqueries are categorised and handled. The best explanation I have found is from https://www.geeksforgeeks.org/sql-correlated-subqueries/:

With a normal nested subquery, the inner SELECT query runs first and executes once, returning values to be used by the main query. A correlated subquery, however, executes once for each candidate row considered by the outer query. In other words, the inner query is driven by the outer query.

SQLAlchemy not building JOIN on select correctly

1 Answers1