I have a table, car_data, as follows:
id | car | date | mileage |
---|---|---|---|
1 | car_4 | 2021-01-05 | 10000 |
2 | car_1 | 2021-01-10 | 20 |
3 | car_3 | 2021-01-11 | 300000 |
4 | car_2 | 2021-01-31 | 1000 |
5 | car_3 | 2021-07-31 | 304000 |
6 | car_4 | 2021-11-30 | 10500 |
7 | car_2 | 2021-12-01 | 2200 |
8 | car_1 | 2021-12-31 | 500 |
9 | car_2 | 2022-02-02 | 2400 |
10 | car_4 | 2022-02-07 | 10900 |
11 | car_1 | 2022-02-15 | 530 |
Now I need a MySQL (v 5.7) query that produces the following result:
car | year_2021_mileage | year_2022_mileage |
---|---|---|
car_1 | 480 | 30 |
car_2 | 1200 | 200 |
car_3 | 4000 |
Note: the solution should include only selected cars (e.g., car_1, car_2, and car_3 but not car_4).
My MySQL (v 5.7) query nearly works, except for car_3. car_3 only operated in 2021. It did not operate in 2022. However, the query excludes car_3 from the results, even for 2021.
Note: data is added over time so the auto-incrementing id may be used to find earliest or latest entries in the query.
Note: the actual data has lots of cars with mileage reporting on many days (e.g., not just start/end of year). The desired solution should be able to filter the cars to a subset of all cars.
I derived my approach from the example and solution shown here to substract values from different rows within the same column.
Explanation of my strategy:
- t1 represents most recent mileage from 2022
- t2 represents mileage from end of 2021
- t3 represents mileage from start of 2021
- Joining approach for t2 and t3 effectively create a cross join with t1 (I'm using this approach because my query is generated using Python SQLAlchemy ORM package which doesn't support cross join method but this approach seems to work)
- The subquery joins restrict the t1, t2, t3 results to the appropriate date for each car
SELECT t1.car,
t2.mileage - t3.mileage AS year_2021_mileage,
t1.mileage - t2.mileage AS year_2022_mileage
FROM car_data t1
JOIN car_data t2
ON t1.id IS NOT NULL
JOIN car_data t3
ON t1.id IS NOT NULL
JOIN (SELECT Max(anon.id) AS recent_id
FROM car_data AS anon
WHERE anon.car = 'car_1'
AND anon.date >= '2022-01-01'
OR anon.car = 'car_2'
AND anon.date >= '2022-01-01'
OR anon.car = 'car_3'
AND anon.date >= '2022-01-01'
GROUP BY anon.car) AS anon_1
ON anon_1.recent_id = t1.id
JOIN (SELECT Max(anon.id) AS end_of_year_id
FROM car_data AS anon
WHERE anon.car = 'car_1'
AND anon.date <= '2021-12-31'
OR anon.car = 'car_2'
AND anon.date <= '2021-12-31'
OR anon.car = 'car_3'
AND anon.date <= '2021-12-31'
GROUP BY anon.car) AS anon_2
ON anon_2.end_of_year_id = t2.id
JOIN (SELECT Min(anon.id) AS start_of_last_year_id
FROM car_data AS anon
WHERE anon.car = 'car_1'
AND anon.date >= '2021-01-01'
OR anon.car = 'car_2'
AND anon.date >= '2021-01-01'
OR anon.car = 'car_3'
AND anon.date >= '2021-01-01'
GROUP BY anon.car) AS anon_3
ON anon_3.start_of_last_year_id = t3.id
WHERE t1.car = t2.car
AND t1.car = t3.car
Please see the fiddle here which can be used to test solutions.
Update: I received the following solution which is compatible with MySQL v8.0. It uses CTE and LAG which are not supported in MySQL v5.7. I'm including it because I like the strategy of using the maximum and minimum mileage for each year. It may serve as good starting point for a solution that works on MySQL v5.7. There's a working fiddle of this solution (for MySQL v8.0) at this link
with cte1 as (
-- CTE1 - get every combination of car and year to ensure no gaps
select D1.car, year(D2.date) year
from car_data D1
cross join car_data D2
group by D1.car, year(D2.date)
),
cte2 as (
-- CTE2 - join on the actual data and get the mileage at the start and end of the year
select C.car, year, min(mileage) start, max(mileage) end
from cte1 C
left join car_data D on D.car = C.car and year(D.date) = C.year
group by C.car, year
),
cte3 as (
-- CTE3 use lag to get the mileage at the end of last year as the start data for this year where it exists
select car, year
, lag (end,1,start) over (partition by car order by car, year) start
, end
from cte2
)
select car
, sum(case when year = 2021 then end-start else null end) 2021_year_mileage
, sum(case when year = 2022 then end-start else null end) 2022_year_mileage
from cte3
where car='car_1' or car='car_2' or car='car_3'
group by car
order by car;