2

I have a list of components which periodically report status.

I want to write a query to find a list of the most recent status grouped by component.

Typically I would use the solution described by the accepted answer for this question: MySQL order by before group by

However there can be multiple statuses reported each second, so there is no guarantee that I will retrieve the most recent. Therefore I would like to find the status with the most recent timestamp and in the case of duplicate timestamps, the one with the highest PK.

Ideally I would like to have a query like the following:

SELECT *
FROM component_status 
ORDER BY component_status.timestamp DESC, component_status.component_status_id DESC
GROUP BY component_status.component_id;

However you cannot perform a GROUP BY after the ORDER BY.

Has anybody had a similar problem and found a solution?

Community
  • 1
  • 1
Fooble
  • 485
  • 2
  • 14

3 Answers3

1

You can use variables to simulate

 ROW_NUMBER() OVER (PARTITION BY component_id 
                    ORDER BY `timestamp` DESC, component_status_id DESC)

window function:

SELECT component_id, component_status_id, `timestamp`
FROM (
SELECT component_id, component_status_id, `timestamp`, 
       @row_number:= 
          IF (@cid <> component_id,
             IF (@cid := component_id, 1, 1),
             IF (@cid := component_id, @row_number + 1, @row_number + 1)) AS rn   
FROM component_status
CROSS JOIN (SELECT @row_number:= 0, @cid := -1) vars
ORDER BY `timestamp` DESC, component_status_id DESC ) t
WHERE rn = 1

rn=1 in the outer query selects the most recent record per component_id. In case there are two or more records having the same timestamp, then the one having the greatest component_status_id will be selected.

Demo here

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
0

It will not provide correct results as order by is worked after group by, for this you can first get order by results in sub-query then you can group them.

for example-

select field1, field2 from (SELECT field1,field2,...,component_status.component_id  
FROM component_status 
ORDER BY component_status.timestamp DESC, component_status.component_status_id DESC) a 
GROUP BY a.component_id;
Zafar Malik
  • 6,734
  • 2
  • 19
  • 30
  • I tried something like this but there are millions and millions of records and it was extremely slow and inefficient (even with indexing). – Fooble Jun 24 '15 at 00:09
  • yes it will be slow if table size is large, if it is one time activity then you have to bear it but if it is required in production then there should be some data filteration to optimize it instead of full table data scan. – Zafar Malik Jun 24 '15 at 04:18
0

I ended up going with the following query to solve my problem:

SELECT 
    component_status.*
FROM
    component_status
JOIN
    (SELECT 
        MAX(component_status_id) AS component_status_id
    FROM
        component_status
    JOIN 
        (SELECT 
            MAX(timestamp) AS timestamp, component_id
        FROM
            component_status
        WHERE
            timestamp <= NOW()
        GROUP BY component_id) AS most_recent_status 
    USING (component_id)
    WHERE component_status.timestamp = most_recent_status.timestamp
    GROUP BY component_id) AS most_recent_status 
USING (component_status_id)

With a composite index on component_id and timestamp the query is instantaneous.

Fooble
  • 485
  • 2
  • 14