Select only the most recent value for each identifier in a table

Question

I have a table that looks like this:

identifier | value | tstamp
-----------+-------+---------------------
abc        | 21    | 2014-01-05 05:24:31
xyz        | 16    | 2014-01-11 03:32:04
sdf        | 11    | 2014-02-06 07:04:24
qwe        | 24    | 2014-02-14 02:12:07
abc        | 23    | 2014-02-17 08:45:24
sdf        | 15    | 2014-03-21 11:23:17
xyz        | 19    | 2014-03-27 09:52:37

I know how to get the most recent value for a single identifier:

select * from table where identifier = 'abc' order by tstamp desc limit 1;

But I want to get the most recent value for all identifiers. How can I do this?

score 8 · Accepted Answer · edited May 23 '17 at 10:26

8

The simplest (and often fastest) way is DISTINCT ON in Postgres:

SELECT DISTINCT ON (identifier) *
FROM   tbl
ORDER  BY identifier, tstamp DESC;

This also returns an ordered list.
SQLFiddle.
Details:
Select first row in each GROUP BY group?

edited May 23 '17 at 10:26

Community

1
1

answered Jun 04 '14 at 22:08

Erwin Brandstetter

605,456
145
1,078
1,228

Thanks, I've chosen your answer. It works, and is the fastest according to `explain analyze`. – BLuFeNiX Jun 05 '14 at 16:20
2

+1, I had no idea about `DISTINCT ON ()` in postgres – Lamak Jun 05 '14 at 16:25

score 5 · Answer 2 · answered Jun 04 '14 at 21:42

5

SELECT *
FROM (  SELECT  *,
                ROW_NUMBER() OVER(PARTITION BY identifier 
                                  ORDER BY tstamp DESC)  AS RN
        FROM YourTable) AS T
WHERE RN = 1

Here is an sqlfiddle with a demo of this.

The results are:

╔════════════╦═══════╦═════════════════════════════════╦════╗
║ IDENTIFIER ║ VALUE ║             TSTAMP              ║ RN ║
╠════════════╬═══════╬═════════════════════════════════╬════╣
║ abc        ║    23 ║ February, 17 2014 08:45:24+0000 ║  1 ║
║ qwe        ║    24 ║ February, 14 2014 02:12:07+0000 ║  1 ║
║ sdf        ║    15 ║ March, 21 2014 11:23:17+0000    ║  1 ║
║ xyz        ║    19 ║ March, 27 2014 09:52:37+0000    ║  1 ║
╚════════════╩═══════╩═════════════════════════════════╩════╝

answered Jun 04 '14 at 21:42

Lamak

69,480
12
108
116

Thanks! Are there any benefits to this approach compared to the `DISTINCT ON` answer? – BLuFeNiX Jun 05 '14 at 16:22
@BLuFeNiX I actually didn't know about `DISTINCT ON` in postgres, so I can't really compare both methods. – Lamak Jun 05 '14 at 16:25
2

@BLuFeNiX: One benefit would be that window functions work in all modern RDBMS (except MySQL), while `DISTINCT ON` is a Postgres specific extension of the standard SQL `DISTINCT` clause. – Erwin Brandstetter Jun 05 '14 at 16:27
1

You can also resolve ties easily and in different ways - using other ranking functions like `RANK()` and `DENSE_RANK()` - which you can't do with `DISTINCT ON`. – ypercubeᵀᴹ Jun 05 '14 at 16:31
this needs a desc index,i dont want to maintain 2 indexes as that slows performance, anything that can be done about it – PirateApp Jun 12 '18 at 05:35

score 2 · Answer 3 · edited Jun 05 '14 at 16:57

2

Filter out everything but the newest identifier like this:

select * from table t
where not exists 
    ( SELECT 1
      FROM table x
      WHERE x.identifier = t.identifier
        AND x.tstamp > t.tstamp
     ) ;

edited Jun 05 '14 at 16:57

ypercubeᵀᴹ

113,259
19
174
235

answered Jun 04 '14 at 21:42

comfortablydrei

316
1
4

I'm not sure what this is supposed to do; I think you may have misinterpreted my question. Thanks, though! – BLuFeNiX Jun 05 '14 at 16:16
@BLuFeNiX The query is fine (now, after the edit.) It's a valid way to solve the problem. – ypercubeᵀᴹ Jun 05 '14 at 16:58
When I originally posted, I forgot to include x.identifier = t.identifier. Thank you for fixing @ypercube – comfortablydrei Jun 06 '14 at 16:44

score 1 · Answer 4 · edited Jun 05 '14 at 17:01

1

Simply join the table with itself, Add "<" as a join condition, and use the result where the right timestamp is null - then there is NO larger item for the defined identifier)

 SELECT t1.* FROM tbl t1
   LEFT JOIN tbl t2
   ON t1.identifier = t2.identifier AND
    t1.tstamp < t2.tstamp
   WHERE t2.tstamp is null

Stolen fiddle: http://sqlfiddle.com/#!15/39e7a/4

edited Jun 05 '14 at 17:01

ypercubeᵀᴹ

113,259
19
174
235

answered Jun 04 '14 at 21:43

dognose

20,360
9
61
107

Thanks! Are there any benefits to this approach compared to the `DISTINCT ON` answer? – BLuFeNiX Jun 05 '14 at 16:22

Select only the most recent value for each identifier in a table

4 Answers4