10

I want to get all the columns of a table with max(timestamp) and group by name.

What i have tried so far is: normal_query ="Select max(timestamp) as time from table"

event_list = normal_query \
            .distinct(Table.name)\
            .filter_by(**filter_by_query) \
            .filter(*queries) \
            .group_by(*group_by_fields) \
            .order_by('').all()

the query i get :

SELECT  DISTINCT ON (schema.table.name) , max(timestamp)....

this query basically returns two columns with name and timestamp.

whereas, the query i want :

SELECT DISTINCT ON (schema.table.name) * from table order by ....

which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing. Can somebody help me?

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
Shubham
  • 997
  • 1
  • 9
  • 15
  • It is not entirely clear what you're after (how is the query failing you, what exactly `normal_query` is etc.), and if you've trouble with the asterisk, reading a good SQL tutorial or book will get you up to speed. Note that often the same query can be written in many ways in SQL. – Ilja Everilä Jul 29 '19 at 12:06
  • thanks for pointing out , i have made little update to the question that might help. – Shubham Jul 29 '19 at 12:14
  • 1
    What you seem to actually want is rows distinct on name (**not** necessarily grouped by) ordered by timestamp in descending order. Here's an example of just that: https://stackoverflow.com/questions/44069023/sqlalchemy-show-only-latest-result-if-a-join-returns-multiple-results. An example of the same in plain SQL here: https://stackoverflow.com/questions/16914098/how-to-select-id-with-max-date-group-by-category-in-postgresql/16920077#16920077. – Ilja Everilä Jul 29 '19 at 12:33

1 Answers1

17

What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting results (N = 1). So instead of grouping and aggregating just

event_list = Table.query.\
    distinct(Table.name).\
    filter_by(**filter_by_query).\
    filter(*queries).\
    order_by(Table.name, Table.timestamp.desc()).\
    all()

This will end up selecting rows "grouped" by name, having the greatest timestamp value.

You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.

In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:

event_list = Table.query.\
    distinct(Table.name).\
    filter_by(**filter_by_query).\
    filter(*queries).\
    order_by(Table.name, Table.timestamp.desc()).\
    from_self().\
    order_by(Table.timestamp.desc()).\
    all()
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • can i also add count(Table.id) to the same query not a separate one, but if separate how to join? if so how? or you have link for SqlAlchemy documentation? which explains this properly – Shubham Jul 31 '19 at 12:40
  • 1
    Hmm might be possible using window functions, though I don't think I fully understand what's been sought here. – Ilja Everilä Jul 31 '19 at 13:04
  • can't use windows functions, i just wrote another query for only count and then made an in memory join of both the queries just by simple python dictionary. And it works great!!!. Thanks.. – Shubham Aug 02 '19 at 05:28
  • @ Ilja Everilä from the above query i have a problem now.. sorry to bring it so late. The order_by(Table.name, Table.timestamp.desc()) doesn't works as expected. The result is ordered by Table.name and Table.timestamp is not working. – Shubham Dec 04 '19 at 05:30
  • It is working, but the timestamp order is only used to pick the latest row per group and the main order is defined by name in order to be able to use the `DISTINCT ON name` approach. If you wish to then order the resulting rows by timestamp, turn the original query in to a subquery and add desired final ordering in the enclosing query. – Ilja Everilä Dec 04 '19 at 06:34
  • 1
    Yes , The above query being the subquery, alert_list = table.query .filter(Table.id == subquery.c.id) .order_by(Table.timestamp.desc()) .all() does the work Thanks Again :) – Shubham Dec 04 '19 at 09:10
  • 1
    Another option would be to use `Query.from_self()` in this case, so before the call to `all()` you'd add `...from_self().order_by(Table.timestamp.desc()).all()`. "From self" automatically turns the original query to a subquery and aliases the (primary) entity. That'd avoid the join as well. – Ilja Everilä Dec 04 '19 at 09:20
  • Does anyone know how to implement this in the new SQLAlchemy 1.4/2.0 API? Since Query is now depreciated. Or another way to get `greatest-n-per-group` in a query? – mihow Feb 02 '23 at 03:26
  • @mihow replace `Table.query` with `select(Table)`. – bfontaine Aug 29 '23 at 15:41