-1

It has been really hard for me to write a title for the question, but here is my situation: I have a bot that saves from a public API the prices of gas tagged by gas station id, time of price change, and type of fuel (it can be, for example, "Gas", "Diesel", "Natural Gas", "Special Gas", etc.).

The table gets updated every day: every time the manager of a station communicates a new price, a new record is added. I need to keep track of prices variations, this is the reason why I am not updating the record for that particular fuel type, but adding a new record.

The table looks like this (more or less, the real table is a little bit more complex):

id station_id updated_timestamp price type_of_fuel
1 1 2023-01-19 1.00 Gas
2 1 2023-01-19 1.20 Diesel
3 2 2023-01-19 1.05 Gas
4 1 2023-01-20 1.10 Gas
5 2 2023-01-20 1.10 Gas
6 1 2023-01-21 1.15 Gas

One of my use cases is to return the historical data, and that is no problem, but I also need to show only the latest available price for each station and each type of fuel. In other words, I expect to receive something like

id station_id updated_timestamp price type_of_fuel
2 1 2023-01-19 1.20 Diesel
5 2 2023-01-20 1.10 Gas
6 1 2023-01-21 1.15 Gas

Starting from a basic query

SELECT
  *
FROM
  TABLE

I understood that a JOIN with a subquery might be the solution, something like:

SELECT
  *
FROM
  TABLE
AS
  T
INNER JOIN (
  SELECT
    id,
    station_id,
    MAX(updated_timestamp)
  FROM
    TABLE
  GROUP BY
    station_id
) AS SUB ON T.id = SUB.id

But this is not working as expected. Any idea? I'd like to be able to write this query, and to understand why it works.

Thank you in advance.

Martin
  • 22,212
  • 11
  • 70
  • 132

2 Answers2

2

Since MySQL 8, you can use the ROW_NUMBER window function for this kind of task.

WITH cte AS (
    SELECT *, 
           ROW_NUMBER() OVER(PARTITION BY station_id, type_of_fuel ORDER BY id DESC) AS rn
    FROM tab
)
SELECT id, station_id, updated_timestamp, price, type_of_fuel 
FROM cte
WHERE rn = 1

Check the demo here.


In MySQL 5.X you are forced to compute aggregation separately. Since you want the last price for each station_id and type of fuel, you need to look for the last id, in partition <station_id, type_of_fuel>. Then join back on matching ids.

SELECT tab.*
FROM tab
INNER JOIN (SELECT MAX(id) AS id
            FROM tab
            GROUP BY station_id, type_of_fuel) cte
        ON tab.id = cte.id

Check the demo here.

Note: If the "ID" field gives you a correct ordering with respect to your data, it's preferable to use it, as long as conditions on integers make join operations more efficient than conditions on dates. If that's not the case, you are forced to use dates instead, changing:

  • MAX(id) AS id to MAX(updated_timestamp) AS updated_timestamp
  • ON tab.id = cte.id to ON tab.updated_timestamp = cte.updated_timestamp.
lemon
  • 14,875
  • 6
  • 18
  • 38
  • 1
    No need for `station_id, type_of_fuel` in the select statement on the subquery – Ergest Basha Jan 23 '23 at 15:36
  • The solution for MySQL 5.X is not absolutely correct, because we don't know that the id column is auto incremented. – DreamWave Jan 23 '23 at 16:07
  • If the "*id*" field stores random numbers, or numbers in a random order, there's no purpose to store it in the database. – lemon Jan 23 '23 at 16:12
  • 1
    id is defined as UNIQUE PRIMAY AUTO_INCREMENT but can I be sure that ids are not "reused"? will they always grow, even if I delete something? – simonelippolis Jan 23 '23 at 16:13
  • Yes, you can be sure about it. If you remove records, a new record will keep the latest id +1 – lemon Jan 23 '23 at 16:14
  • you should treat ids as opaque values, even autoimcrement ones, not impute order based on them. for example, if you go and insert a bunch of historical data, the id will not reflect actual temporal order – ysth Jan 23 '23 at 17:55
  • I've added a note to the answer that clears out the decision on using ids rather than dates. – lemon Jan 23 '23 at 18:00
0

Hope this works.

with tsample as (
select '1' as id,'1' as station_id,'2023-01-19' as updated_time,'1.00' as price,'Gas' as type union all
select '2' as id,'1' as station_id,'2023-01-19' as updated_time,'1.20' as price,'Diesel' as type union all
select '3' as id,'2' as station_id,'2023-01-19' as updated_time,'1.05' as price,'Gas' as type union all
select '4' as id,'1' as station_id,'2023-01-20' as updated_time,'1.10' as price,'Gas' as type union all
select '5' as id,'2' as station_id,'2023-01-20' as updated_time,'1.10' as price,'Gas' as type union all
select '6' as id,'1' as station_id,'2023-01-21' as updated_time,'1.15' as price,'Gas' as type 
)
select station_id,type,max(price),max(updated_time),max(id) from tsample group by 1,2
BMX_01
  • 45
  • 4
  • 1
    the max price and time and id could come from 3 different rows; that's not usually what people asking this question want – ysth Jan 23 '23 at 17:58