19

I have a table such has

identifier date        value
A          2017-01-01  2 
A          2017-01-02  1
A          2017-01-03  7
B          2017-01-01  2 
B          2017-01-02  7
B          2017-01-03  3

I am trying to select the max date of each identifier such as I get :

identifier date        value
A          2017-01-03  7
B          2017-01-03  3

thank you

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
Steven G
  • 16,244
  • 8
  • 53
  • 77
  • 2
    This is a very common question and has spawned its own tag: [tag:greatest-n-per-group]. Try searching a bit. Solutions can be a bit DB specific, so please at least provide that information. – Ilja Everilä Aug 20 '17 at 13:50
  • @IljaEverilä I saw that, but there was no equivalent for sql alchemy (python equivalent) – Steven G Aug 20 '17 at 13:52
  • I know that to be untrue. Here's a Postgresql solution for example: https://stackoverflow.com/questions/44069023/sqlalchemy-show-only-latest-result-if-a-join-returns-multiple-results, though it was missing the tag. It's also rather straightforward to implement plain SQL solutions in SQLA. – Ilja Everilä Aug 20 '17 at 13:54
  • 3
    @IljaEverilä not sure where you see a group by in the question you are referring to. My SQL skills are very poor (full disclosure), it`s not always easy to incorporate a half similar solution and make it work – Steven G Aug 20 '17 at 13:59
  • Nowhere as a clause, because it uses a Postgresql specific DISTINCT ON ... ORDER BY combination to achieve greatest n per group, efficiently. It's true that searching for SQL solutions is daunting, if not familiar with the terminology. – Ilja Everilä Aug 20 '17 at 14:01
  • @IljaEverilä see, you are teaching me something. – Steven G Aug 20 '17 at 14:03
  • For future and on the subject of teaching, don't fall in the trap of "XY problem", where you think something is a solution to a problem and then ask about that something, instead of asking about the problem itself. For example here a left join would work as well, not just a (inner) group by query. – Ilja Everilä Aug 20 '17 at 14:31

4 Answers4

44

Using a subquery:

SELECT t1.identifier, t1.date, t1.value FROM table t1
JOIN
(
    SELECT identifier, MAX(date) maxdate
    FROM table
    GROUP BY identifier
) t2
ON t1.identifier = t2.identifier AND t1.date = t2.maxdate;

In SQLAlchemy:

from sqlalchemy import func, and_

subq = session.query(
    Table.identifier,
    func.max(Table.date).label('maxdate')
).group_by(Table.identifier).subquery('t2')

query = session.query(Table).join(
    subq,
    and_(
        Table.identifier == subq.c.identifier,
        Table.date == subq.c.maxdate
    )
)
davidsbro
  • 2,761
  • 4
  • 23
  • 33
r-m-n
  • 14,192
  • 4
  • 69
  • 68
12

With ORM you could use over function that is actually is a window function:

session \
    .query(Table, func.max(Table.date)
           .over(partition_by=Table.identifier, order_by=Table.value))

It returns a tuple (table_instance, latest_datetime). order_by is optional in this case.

The same with SQL Expressions.

Eugene Lopatkin
  • 2,351
  • 1
  • 22
  • 34
2

In SQLAlchemy core, it can be achieved using the following code -

import sqlalchemy as db

query = db.select([
    TABLE.c.identifier,
    db.func.max(USERS.c.date),
    TABLE.c.value,
]).group_by(TABLE.c.identifier)

result = engine.execute(query).fetchall()
Amit Pathak
  • 1,145
  • 11
  • 26
0

in orm you may write it almost as you would in mysql

result = session.query(Table,func.max(Table.date)).group_by(Table.identifier)
for row,i in result:
    print(row.date,row.value,row.identifier,i)