SQLAlchemy use IN with first value only of a subquery

Question

I feel like this will be answered elsewhere, but for the life of me I can't find the correct search term.

I have a subquery that selects 2 values, id and MAX(date). The date is required for getting the latest value when using GROUP BY, but it is not needed after this stage.

How can I "discard" the date column so that I'm able to use the id IN (subquery) statement? In the meantime I'm selecting func.max(Model.id) to get around the issue.

Here's a trimmed down example of what I'm attempting to do:

# Get the IDs of each latest link to a relationship
>>> subquery = session.query(Model.id, func.max(Model.date)).group_by(Model.relationship_id)

# Use these IDs as part of another query
>>> query = session.query(Model.id).filter(Model.id.in_(subquery))
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) sub-select returns 2 columns - expected 1

Since you are using SQLite, have you tried `session.query(Model.id).group_by(Model.relationship_id).having(func.max(Model.date))`? Note that using `GROUP BY` and `MAX()` to get greatest-per-group only works in SQLite. — Ilja Everilä, Mar 05 '21 at 17:28
Thanks, I'm currently using SQLite for testing with the intention of moving to MySQL as soon as IT can set it up. Does that syntax work on both? — Peter, Mar 05 '21 at 18:24
In MySQL using `MAX()` will not guarantee that the value of `Model.id` within a group is picked from the row with max date — that's something unique to SQLite. It may do so for some query plans, but an update to MySQL or just a change in data can break that. You need to use other methods such as self left/outer join on `relationship_id` and date, picking the row where there is no other row with greater date. — Ilja Everilä, Mar 05 '21 at 18:38
Newer versions of MySQL — and SQLite — also support window functions, of which `RANK()` or `ROW_NUMBER()` are a good tool for a [tag:greatest-n-per-group] query. — Ilja Everilä, Mar 05 '21 at 18:45
Sorry forgot to respond earlier, thanks for the heads up. I'll have to experiment when I'm back at work tomorrow. Very helpfully someone asked a [duplicate question](https://stackoverflow.com/q/66493790/2403000) recently and got linked to 5 different questions, fingers crossed I can adapt one of those :) — Peter, Mar 08 '21 at 00:29
Attempted it tonight actually, `RANK` works very well, thanks a bunch for the suggestion. — Peter, Mar 08 '21 at 02:18

Yaakov Bressler · Accepted Answer · 2021-03-08T02:25:09.930

0

The issue you're experiencing is that your subquery is returning 2 columns --> when your main query searches for the id, it doesn't know which column to search through.

You can specify which subquery column to filter on using c.column_name:

query = session.query(Model.id).filter(Model.id == subquery.subquery().c.id)

Alternatively, since your subquery shares a key with your main query, you can perform a join:
(Might make your life easier in some scenarios)

# This should work, but if it doesn't, may need to specify "on" param
query = session.query(Model.id).join(subquery.subquery())

edited Mar 08 '21 at 02:25

answered Mar 07 '21 at 17:10

Yaakov Bressler

9,056
2
45
69

1

Thanks, though in this case the subquery is a [Query](https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query) object and has no `c` or `columns` attributes. I'm guessing you were thinking of it as if it's already been executed? – Peter Mar 08 '21 at 01:09
I see. Solve that by adding `.subquery()` to the end of your subquery. @Peter _Else, perhaps you can reference your subquery directly (without `.c`?)_ – Yaakov Bressler Mar 08 '21 at 01:38
Ah so `in_()` doesn't accept a column, but it does seem to do the same thing with `==`. If you update your answer I'll mark it as accepted :) - `.filter(Model.id==subquery.subquery().c.id)` – Peter Mar 08 '21 at 02:09

SQLAlchemy use IN with first value only of a subquery

1 Answers1