26

I am storing the response to various rpc calls in a mysql table with the following fields:

Table: rpc_responses

timestamp   (date)
method      (varchar)
id          (varchar)
response    (mediumtext)

PRIMARY KEY(timestamp,method,id)

What is the best method of selecting the most recent responses for all existing combinations of method and id?

  • For each date there can only be one response for a given method/id.

  • Not all call combinations are necessarily present for a given date.

  • There are dozens of methods, thousands of ids and at least 365 different dates

Sample data:

timestamp  method  id response
2009-01-10 getThud 16 "....."
2009-01-10 getFoo  12 "....."
2009-01-10 getBar  12 "....."
2009-01-11 getFoo  12 "....."
2009-01-11 getBar  16 "....."

Desired result:

2009-01-10 getThud 16 "....."
2009-01-10 getBar 12 "....."
2009-01-11 getFoo 12 "....."
2009-01-11 getBar 16 "....."

(I don't think this is the same question - it won't give me the most recent response)

Community
  • 1
  • 1
Ken
  • 77,016
  • 30
  • 84
  • 101
  • MySQL manual has a similar case: [The Rows Holding the Group-wise Maximum of a Certain Column](https://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html) – cgaldiolo Oct 23 '15 at 15:16

8 Answers8

30

This solution was updated recently.
Comments below may be outdated

This can query may perform well, because there are no joins.

SELECT * FROM (
    SELECT *,if(@last_method=method,0,1) as new_method_group,@last_method:=method 
    FROM rpc_responses 
    ORDER BY method,timestamp DESC
) as t1
WHERE new_method_group=1;

Given that you want one resulting row per method this solution should work, using mysql variables to avoid a JOIN.

FYI, PostgreSQL has a way of doing this built into the language:

SELECT DISTINCT ON (method) timestamp, method, id, response
FROM rpc_responses
WHERE 1 # some where clause here
ORDER BY method, timestamp DESC
velcrow
  • 6,336
  • 4
  • 29
  • 21
  • 5
    This method appears to depend on the fact that the GROUP BY will collapse the found rows in t1 to only the first. Is this guaranteed in MySQL? – mkoistinen Dec 18 '12 at 12:18
  • 1
    Not SQL standard, but yes, it is guaranteed in MySQL. What guarantees it is the "ORDER BY timestamp DESC". If someone enables 'ONLY_FULL_GROUP_BY' mode, it will cease to work though. see stackoverflow.com/a/9797138/461096 stackoverflow.com/a/1066504/461096 rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html – velcrow Feb 18 '13 at 18:04
  • For me the grouping did not work, until i added a "DISTINCT" in the inner query. Dont know why, and there is no logical reason for this behaviour, but seems to work. Without the DISTINCT the query did not always pick the first row of the inner query. But genious idea, never would think of this on my own. – Gunni Sep 24 '14 at 12:03
  • Why am I made nervous to learn that `ORDER BY`, which I previously thought was just a tool to make reading output easier, actually plays a huge role in `GROUP BY`... – tumultous_rooster Sep 29 '15 at 00:14
  • 10
    This is wrong. From [MySQL manual](https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html): "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses." – cgaldiolo Oct 23 '15 at 14:34
  • The sub-query solution with join works will all types of DBMS though. however this one, works with MySql defaults only. which is still good :D – mostafa.S Feb 13 '16 at 10:44
  • 2
    @cgaldiolo is correct here! This is a terrible answer! There is no guarantee that this will work under all circumstances with current MySQL version, let alone any future versions. – Jannes Mar 10 '16 at 12:37
  • @Jannes cgaldiolo's answer would be totally correct IF there is only one query existed. However, in the answer, GROUP BY is used on the OUTSIDE query. Yes the GROUP BY does not care ORDER BY in the SAME query but this case with being on the outside query, GROUP BY already receives ordered rows. HOWEVER, the answer is still too risky to use in production level, I would not use it. – Tarik May 10 '17 at 11:02
  • @Tarik That's totally implementation dependent and IF it works, it's purely coincidental and never future proof. For one, I'm pretty sure the inner ORDER BY is simply dropped completely by the optimizer in recent MySQL versions, so the resultset is actually never sorted at all (except by the GROUP BY eventually, but if I'm not mistaken they're dropping that too in an upcoming version). This really should not be the most upvoted answer at all. – Jannes May 10 '17 at 14:10
  • 1
    This is the [fastest code](https://stackoverflow.com/a/72983882/11154841) of the three main answers here. – questionto42 Jul 14 '22 at 18:09
15

Self answered, but I'm not sure that it will be an efficient enough solution as the table grows:

SELECT timestamp,method,id,response FROM rpc_responses 
INNER JOIN
(SELECT max(timestamp) as timestamp,method,id FROM rpc_responses GROUP BY method,id) latest
USING (timestamp,method,id);
questionto42
  • 7,175
  • 4
  • 57
  • 90
Ken
  • 77,016
  • 30
  • 84
  • 101
  • 1
    As far as I know, you have to use a subquery to get what you want. – Adam Bellaire Jan 12 '09 at 16:09
  • 3
    sorry for reviving this after so long, but shouldn't the `max(timestamp)` in the subquery have an alias called `timestamp` ? Otherwise, mysql gives an error: `SQL Error (1054): Unknown column 'timestamp' in 'from clause'`, because USING() requires both tables to have the same column names (I tried it in mysql version 5.1 and 5.5). Adding the alias solves the issue. – DiegoDD Oct 24 '16 at 16:05
  • This is the [second fastest code](https://stackoverflow.com/a/72983882/11154841) of the three main answers here. – questionto42 Jul 14 '22 at 18:08
6

Try this...

SELECT o1.id, o1.timestamp, o1.method, o1.response   
FROM rpc_responses o1
WHERE o1.timestamp = ( SELECT max(o2.timestamp)
                       FROM rpc_responses o2
                       WHERE o1.id = o2.id )
ORDER BY o1.timestamp, o1.method, o1.response

...it even works in Access!

Martin F
  • 590
  • 7
  • 28
versek
  • 61
  • 1
  • 1
0

i used this,worked for me

select max(timestamp),method,id from tables where 1 group by method,id order by timestamp desc 
charles
  • 307
  • 5
  • 5
0

Subquery is very taxing when the data set becomes larger.

Try this:

SELECT t1.* 
FROM rpc_responses AS t1 
INNER JOIN rpc_responses AS t2 
GROUP BY t1.method, t1.id, t1.timestamp
HAVING t1.timestamp=MAX(t2.timestamp)    
ORDER BY t1.timestamp, t1.method, t1.response;
Mike_OBrien
  • 1,395
  • 15
  • 34
Simon
  • 17
  • 1
0

Checking the three main answers in some other use case shows that the most voted answer is also by far the fastest, swarm intelligence works here:

# Answer 1: https://stackoverflow.com/a/12625667/11154841
# 165ms
SELECT
    COUNT(0)
FROM
    (
    SELECT
        mtn.my_primary_key,
        mtn.my_info_col,
        IF(@last_my_primary_key = my_primary_key,
        0,
        1) AS new_my_primary_key_group,
        @last_my_primary_key := my_primary_key
    FROM
        my_db_schema.my_table_name mtn
    WHERE
        mtn.date_time_col > now() - INTERVAL 1 MONTH
    ORDER BY
        my_primary_key,
        mtn.date_time_col DESC
) AS t1
WHERE
    new_my_primary_key_group = 1
    AND t1.my_info_col = 'delete';

# Answer 2: https://stackoverflow.com/a/435709/11154841
# 757ms
SELECT
    count(0)
FROM
    my_db_schema.my_table_name mtn
JOIN
(
    SELECT
        my_primary_key,
        max(date_time_col) AS date_time_col
    FROM
        my_db_schema.my_table_name mtn
    WHERE
        mtn.date_time_col > now() - INTERVAL 1 MONTH
    GROUP BY
        mtn.my_primary_key) latest
        USING (my_primary_key,
    date_time_col)
WHERE
    mtn.my_info_col = 'delete';

# Answer 3: https://stackoverflow.com/a/3185644/11154841
# 1.310s
SELECT
    count(0)
FROM
    my_db_schema.my_table_name mtn
WHERE
    mtn.date_time_col = (
    SELECT
        max(mtn2.date_time_col)
    FROM
        my_db_schema.my_table_name mtn2
    WHERE
        mtn2.my_primary_key = mtn.my_primary_key
        AND mtn2.date_time_col > now() - INTERVAL 1 MONTH
)
    AND mtn.date_time_col > now() - INTERVAL 1 MONTH
    AND mtn.my_info_col = 'delete';
questionto42
  • 7,175
  • 4
  • 57
  • 90
-1

The concept of "most recent" is fairly vague. If you mean something like the 100 most recent rows then you can just add a TOP(100) to your SELECT clause.

If you mean the "most recent" based on a the most recent date then you can just do

SELECT timestamp,method,id,response 
FROM rpc_responses
HAVING max(timestamp) = timestamp 
Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
Neil
  • 2,316
  • 1
  • 22
  • 26
  • 1
    I want the most recent record for each combination of method/id. Not all combinations are changed with every timestamp so I can't just specify the latest timestamp. – Ken Jan 12 '09 at 15:30
  • 2
    HAVING max(timestamp) = timestamp gives me an empty set – Ken Jan 12 '09 at 15:47
-4

...is more than one year later but i might help someone To select all the queries starting from latest

SELECT *
FROM rpc_responses
ORDER BY timestamp DESC
Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
spi
  • 41
  • 2