1

I feel like this will be answered elsewhere, but for the life of me I can't find the correct search term.

I have a subquery that selects 2 values, id and MAX(date). The date is required for getting the latest value when using GROUP BY, but it is not needed after this stage.

How can I "discard" the date column so that I'm able to use the id IN (subquery) statement? In the meantime I'm selecting func.max(Model.id) to get around the issue.

Here's a trimmed down example of what I'm attempting to do:

# Get the IDs of each latest link to a relationship
>>> subquery = session.query(Model.id, func.max(Model.date)).group_by(Model.relationship_id)

# Use these IDs as part of another query
>>> query = session.query(Model.id).filter(Model.id.in_(subquery))
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) sub-select returns 2 columns - expected 1
Peter
  • 3,186
  • 3
  • 26
  • 59
  • 1
    Since you are using SQLite, have you tried `session.query(Model.id).group_by(Model.relationship_id).having(func.max(Model.date))`? Note that using `GROUP BY` and `MAX()` to get greatest-per-group only works in SQLite. – Ilja Everilä Mar 05 '21 at 17:28
  • Thanks, I'm currently using SQLite for testing with the intention of moving to MySQL as soon as IT can set it up. Does that syntax work on both? – Peter Mar 05 '21 at 18:24
  • In MySQL using `MAX()` will not guarantee that the value of `Model.id` within a group is picked from the row with max date — that's something unique to SQLite. It may do so for some query plans, but an update to MySQL or just a change in data can break that. You need to use other methods such as self left/outer join on `relationship_id` and date, picking the row where there is no other row with greater date. – Ilja Everilä Mar 05 '21 at 18:38
  • Newer versions of MySQL — and SQLite — also support window functions, of which `RANK()` or `ROW_NUMBER()` are a good tool for a [tag:greatest-n-per-group] query. – Ilja Everilä Mar 05 '21 at 18:45
  • Sorry forgot to respond earlier, thanks for the heads up. I'll have to experiment when I'm back at work tomorrow. Very helpfully someone asked a [duplicate question](https://stackoverflow.com/q/66493790/2403000) recently and got linked to 5 different questions, fingers crossed I can adapt one of those :) – Peter Mar 08 '21 at 00:29
  • Attempted it tonight actually, `RANK` works very well, thanks a bunch for the suggestion. – Peter Mar 08 '21 at 02:18

1 Answers1

0

The issue you're experiencing is that your subquery is returning 2 columns --> when your main query searches for the id, it doesn't know which column to search through.

You can specify which subquery column to filter on using c.column_name:

query = session.query(Model.id).filter(Model.id == subquery.subquery().c.id)

Alternatively, since your subquery shares a key with your main query, you can perform a join:
(Might make your life easier in some scenarios)

# This should work, but if it doesn't, may need to specify "on" param
query = session.query(Model.id).join(subquery.subquery())
Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
  • 1
    Thanks, though in this case the subquery is a [Query](https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query) object and has no `c` or `columns` attributes. I'm guessing you were thinking of it as if it's already been executed? – Peter Mar 08 '21 at 01:09
  • I see. Solve that by adding `.subquery()` to the end of your subquery. @Peter _Else, perhaps you can reference your subquery directly (without `.c`?)_ – Yaakov Bressler Mar 08 '21 at 01:38
  • Ah so `in_()` doesn't accept a column, but it does seem to do the same thing with `==`. If you update your answer I'll mark it as accepted :) - `.filter(Model.id==subquery.subquery().c.id)` – Peter Mar 08 '21 at 02:09