36

Daft SQL question. I have a table like so ('pid' is auto-increment primary col)

CREATE TABLE theTable (
    `pid` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    `timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    `cost` INT UNSIGNED NOT NULL,
    `rid` INT NOT NULL,
) Engine=InnoDB;

Actual table data:

INSERT INTO theTable (`pid`, `timestamp`, `cost`, `rid`)
VALUES
  (1, '2011-04-14 01:05:07', 1122, 1),
  (2, '2011-04-14 00:05:07', 2233, 1),
  (3, '2011-04-14 01:05:41', 4455, 2),
  (4, '2011-04-14 01:01:11', 5566, 2),
  (5, '2011-04-14 01:06:06', 345, 1),
  (6, '2011-04-13 22:06:06', 543, 2),
  (7, '2011-04-14 01:14:14', 5435, 3),
  (8, '2011-04-14 01:10:13', 6767, 3)
;

I want to get the PID of the latest row for each rid (1 result per unique RID). For the sample data, I'd like:

pid | MAX(timestamp)      | rid
-----------------------------------
5   | 2011-04-14 01:06:06 | 1
3   | 2011-04-14 01:05:41 | 2
7   | 2011-04-14 01:14:14 | 3

I've tried running the following query:

SELECT MAX(timestamp),rid,pid FROM theTable GROUP BY rid

and I get:

max(timestamp)     ; rid; pid
----------------------------
2011-04-14 01:06:06; 1  ; 1
2011-04-14 01:05:41; 2  ; 3
2011-04-14 01:14:14; 3  ; 7

The PID returned is always the first occurence of PID for an RID (row / pid 1 is frst time rid 1 is used, row / pid 3 the first time RID 2 is used, row / pid 7 is first time rid 3 is used). Though returning the max timestamp for each rid, the pids are not the pids for the timestamps from the original table. What query would give me the results I'm looking for?

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
codinghands
  • 1,741
  • 2
  • 18
  • 31
  • Can you show the result of the exact query you are running? – AJ. Apr 14 '11 at 01:09
  • possible duplicate of [Fetch the row which has the Max value for a column](http://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column) – outis Dec 23 '11 at 02:33
  • Proper [sample code](http://sscce.org/) (here, SQL statements) is more useful than any ad hoc schema and sample data format. Please use `CREATE TABLE` and `INSERT ... VALUES` for samples. Desired results don't need to be presented as sample code, as results are the output of code and not code themselves. – outis Dec 26 '11 at 07:13

7 Answers7

64

(Tested in PostgreSQL 9.something)

Identify the rid and timestamp.

select rid, max(timestamp) as ts
from test
group by rid;

1   2011-04-14 18:46:00
2   2011-04-14 14:59:00

Join to it.

select test.pid, test.cost, test.timestamp, test.rid
from test
inner join 
    (select rid, max(timestamp) as ts
    from test
    group by rid) maxt
on (test.rid = maxt.rid and test.timestamp = maxt.ts)
Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • Magic, worked a treat. Any idea why 'SELECT MAX(timestamp),rid,pid FROM theTable GROUP BY rid' didn't work? – codinghands Apr 14 '11 at 01:45
  • 2
    Because you want ***for every rid*** (the `GROUP BY rid`), to show "***maximum timestamp***" (the `MAX(timestamp)`) and for that row, with maximum timestamp, the ***related pid***. This is where your idea gets stuck. You need a "windowing" function to do that, or to group and then `JOIN` to the grouped subquery, as Catcall's solution. MYSQL does not have windowing functions. – ypercubeᵀᴹ Apr 14 '11 at 09:40
  • 3
    Even worse, MySQL does not raise an error but fetches a pid from a (random) row. – ypercubeᵀᴹ Apr 14 '11 at 09:43
  • 1
    @codinghands: Because of a misfeature in MySQL that lets you omit some unaggregated columns from the GROUP BY clause. http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html It's not being able to omit columns that's bad; it's being able to omit columns that aren't functionally dependent on any of the included columns that's bad. – Mike Sherrill 'Cat Recall' Apr 14 '11 at 10:21
  • @ypercube @Catcall thanks for replies - good to know for future. Fetching a random pid without an error seems mad... – codinghands Apr 14 '11 at 13:44
  • Query took 0.0596 sec with an index on rid and timestamp, check out the other method I posted. – Caio Iglesias Mar 06 '13 at 17:23
  • never would have figured this out on my own, thanks! – user1279741 Sep 09 '15 at 22:39
  • this works great unless you have multiple items at the same timestamp. I needed to add another `MAX`'d field in the inner join, and an `and` on the `on` on the outer to reflect that field – Horse Jun 29 '17 at 12:52
9
select *
from (
    select `pid`, `timestamp`, `cost`, `rid`
    from theTable 
    order by `timestamp` desc
) as mynewtable
group by mynewtable.`rid`
order by mynewtable.`timestamp`

Hope I helped !

Benesh
  • 3,398
  • 1
  • 18
  • 38
anzize
  • 91
  • 1
  • 1
5
SELECT t.pid, t.cost, to.timestamp, t.rid
FROM test as t
JOIN (
    SELECT rid, max(tempstamp) AS maxtimestamp
    FROM test GROUP BY rid
) AS tmax
    ON t.pid = tmax.pid and t.timestamp = tmax.maxtimestamp
Jeshurun
  • 22,940
  • 6
  • 79
  • 92
dkretz
  • 37,399
  • 13
  • 80
  • 138
2

I created an index on rid and timestamp.

SELECT test.pid, test.cost, test.timestamp, test.rid
FROM theTable AS test
LEFT JOIN theTable maxt 
ON maxt.rid = test.rid
AND maxt.timestamp > test.timestamp
WHERE maxt.rid IS NULL 

Showing rows 0 - 2 (3 total, Query took 0.0104 sec)

This method will select all the desired values from theTable (test), left joining itself (maxt) on all timestamps higher than the one on test with the same rid. When the timestamp is already the highest one on test there are no matches on maxt - which is what we are looking for - values on maxt become NULL. Now we use the WHERE clause maxt.rid IS NULL or any other column on maxt.

Caio Iglesias
  • 594
  • 9
  • 24
0

If you want to avoid a JOIN, you can use:

SELECT pid, rid FROM theTable t1 WHERE t1.pid IN ( SELECT MAX(t2.pid) FROM theTable t2 GROUP BY t2.rid);
  • Based on the data, the higher PID is not necessarily associated with the later timestamp. – Darius X. Oct 15 '14 at 16:28
  • You are right. It only works if timestamp is not modified anywhere in the program and only stores de creation_date. If not, as the pid is autoinc and timestamp current_timestamp I asume higher PID do corresponds to later timestamp – Nicolás Previale Nov 05 '14 at 15:51
0

You could also have subqueries like that:

SELECT ( SELECT MIN(t2.pid)
         FROM test t2
         WHERE t2.rid = t.rid
           AND t2.timestamp = maxtimestamp
       ) AS pid 
     , MAX(t.timestamp) AS maxtimestamp
     , t.rid
FROM test t
GROUP BY t.rid

But this way, you'll need one more subquery if you want cost included in the shown columns, etc.

So, the group by and join is better solution.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
-1

Try:

select pid,cost, timestamp, rid from theTable order by timestamp DESC limit 2;
vy32
  • 28,461
  • 37
  • 122
  • 246