8

Let's say I have an Author table and a Post table, and each Author can have several Posts.

Now, with a single sqlalchemy query, I want to get all of my active Authors and the most recent published Post for each.

I've been trying to go at this by getting a list of Posts that joinedload the Author, using a subquery to group the results together, like this:

subquery = DBSession.query(Author.id, func.max(Post.publish_date).label("publish_date")) \
    .join(Post.author) \
    .filter(Post.state == 'published') \
    .filter(Author.state == 'active') \
    .group_by(Author.id) \
    .subquery()

query = DBSession.query(Post) \
    .options(joinedload(Post.author)) \
    .join(Post.author) \
    .join(subquery, and_(Author.id == subquery.c.id, 
                         Post.publish_date == subquery.c.publish_date))

But if I have two Posts from an Author with the same publish_date, and those are the newest Posts, that means I get that Author appearing twice in my results list. And while I could use a second subquery to eliminate dupes (take func.max(Post.id)), it seems like really, really the wrong way to go about this. Is there a better way to go about this?

(Again, I'm looking for a single query, so I'm trying to avoid querying on the Author table, then looping through and doing a Post query for every Author in my results.)

shroud
  • 725
  • 8
  • 13
  • I feel like [this answer](http://stackoverflow.com/a/2043290/405075) does what I want to do if I were to add an ORDER BY clause, but I don't know how to translate that from SQL into sqlalchemy. – shroud Oct 16 '14 at 05:18
  • If your db supports window functions with the over clause, you could use the approach from [this answer](http://stackoverflow.com/a/17457858/1587090) – Brendon McKeon Oct 16 '14 at 07:14

1 Answers1

11

I would do it as following:

LastPost = aliased(Post, name='last')
last_id = (
    session.query(LastPost.id)
    .filter(LastPost.author_id == Author.id)
    .order_by(LastPost.publish_date.desc())
    .order_by(LastPost.id.desc())
    .limit(1)
    .correlate(Author)
    .as_scalar()
)

query = (
    DBSession.query(Author, Post)
    .outerjoin(Post, Post.id == last_id)
)

for author, last_post in query:
    print(author, last_post)

As you can see, the result is a tuple of pairs (Author, LastPost).
Change outerjoin to join if you only want authors that have at least one Post.
Also, I do not preload any relationship Author.post to avoid any confusion.

van
  • 74,297
  • 13
  • 168
  • 171
  • Note: `as_scalar()` is deprecated in sqlalchemy 1.4 and is replaced by `scalar_subquery()` (see docs: https://docs.sqlalchemy.org/en/14/orm/query.html?highlight=as_scalar#sqlalchemy.orm.Query.scalar_subquery) – ajiang Dec 29 '20 at 16:05