0

Let's say I have these rows in a table:

ID | SomeColumn | SomeDateTime
---+------------+-------------
1  | X1         | 01-01-14 14:00:00
2  | X2         | 01-01-14 22:00:00
3  | Y1         | 01-03-14 12:00:00
4  | Y2         | 01-03-14 23:00:00

How could I make an SQL query to grab just the rows with distinct dates (choosing the latest times)? That is, the result of the query should be:

ID | SomeColumn | SomeDateTime
---+------------+-------------
2  | X2         | 01-01-14 22:00:00
4  | Y2         | 01-03-14 23:00:00

I've tried this but it doesn't work:

SELECT MAX(SomeDateTime),ID,SomeColumn,SomeDateTime
FROM MyTable GROUP BY (SomeDateTime::date)

The error I get is:

The column "MyTable.ID" needs to appear in the GROUP BY clause or be used in an aggregation function

I'm using PostgreSQL, but if you have answers for other DB engines, I'll upvote too.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
knocte
  • 16,941
  • 11
  • 79
  • 125

2 Answers2

1
SELECT DISTINCT ON (some_datetime::date)
       id, some_column, some_datetime
FROM   tbl 
ORDER  BY some_datetime::date, some_datetime DESC;

Detailed explanation for DISTINCT ON:
Select first row in each GROUP BY group?

The slightly special requirement here is to form partitions on a derived value. A cast to date is the simplest and fastest way.

Returns a single row per day (for days with any rows). If there are ties, an arbitrary row is picked, unless you add more ORDER BY items as tiebreaker.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • thanks Erwin, I've also found another solution which I posted as another answer, do you see any disadvantages with it? – knocte Mar 31 '14 at 16:47
0

I've found a fairly readable solution (even if it's a query with a subquery, which might be less efficient than the proposed solutions):

SELECT * FROM MyTable WHERE SomeDateTime IN (
    SELECT MAX(SomeDateTime) FROM MyTable GROUP BY (SomeDateTime::date)
)
knocte
  • 16,941
  • 11
  • 79
  • 125
  • 1
    If `SomeDateTime` is not unique, there is a difference. While this one returns *all* rows matching the maximum timestamp per day, my solution returns a maximum of *one* row per day. Same difference as with [`row_number()` vs. `rank()`](http://stackoverflow.com/questions/13410315/how-to-select-more-than-1-record-per-day/13414130#13414130). It depends what you need exactly. To see which is faster just test with [`EXPLAIN ANALYZE`](http://stackoverflow.com/questions/18867836/poor-performance-on-a-postgresql-query/19460872#19460872). – Erwin Brandstetter Mar 31 '14 at 19:13