1

I have a table, car_data, as follows:

id car date mileage
1 car_4 2021-01-05 10000
2 car_1 2021-01-10 20
3 car_3 2021-01-11 300000
4 car_2 2021-01-31 1000
5 car_3 2021-07-31 304000
6 car_4 2021-11-30 10500
7 car_2 2021-12-01 2200
8 car_1 2021-12-31 500
9 car_2 2022-02-02 2400
10 car_4 2022-02-07 10900
11 car_1 2022-02-15 530

Now I need a MySQL (v 5.7) query that produces the following result:

car year_2021_mileage year_2022_mileage
car_1 480 30
car_2 1200 200
car_3 4000

Note: the solution should include only selected cars (e.g., car_1, car_2, and car_3 but not car_4).

My MySQL (v 5.7) query nearly works, except for car_3. car_3 only operated in 2021. It did not operate in 2022. However, the query excludes car_3 from the results, even for 2021.

Note: data is added over time so the auto-incrementing id may be used to find earliest or latest entries in the query.

Note: the actual data has lots of cars with mileage reporting on many days (e.g., not just start/end of year). The desired solution should be able to filter the cars to a subset of all cars.

I derived my approach from the example and solution shown here to substract values from different rows within the same column.

Explanation of my strategy:

  • t1 represents most recent mileage from 2022
  • t2 represents mileage from end of 2021
  • t3 represents mileage from start of 2021
  • Joining approach for t2 and t3 effectively create a cross join with t1 (I'm using this approach because my query is generated using Python SQLAlchemy ORM package which doesn't support cross join method but this approach seems to work)
  • The subquery joins restrict the t1, t2, t3 results to the appropriate date for each car
    SELECT t1.car,
       t2.mileage - t3.mileage AS year_2021_mileage,
       t1.mileage - t2.mileage AS year_2022_mileage

    FROM   car_data t1
       JOIN car_data t2
         ON t1.id IS NOT NULL
       JOIN car_data t3
         ON t1.id IS NOT NULL

       JOIN (SELECT Max(anon.id) AS recent_id
             FROM   car_data AS anon
             WHERE  anon.car = 'car_1'
                    AND anon.date >= '2022-01-01'
                     OR anon.car = 'car_2'
                        AND anon.date >= '2022-01-01'
                     OR anon.car = 'car_3'
                       AND anon.date >= '2022-01-01'
             GROUP  BY anon.car) AS anon_1
         ON anon_1.recent_id = t1.id

       JOIN (SELECT Max(anon.id) AS end_of_year_id
             FROM   car_data AS anon
             WHERE  anon.car = 'car_1'
                    AND anon.date <= '2021-12-31'
                     OR anon.car = 'car_2'
                        AND anon.date <= '2021-12-31'
                     OR anon.car = 'car_3'
                        AND anon.date <= '2021-12-31'
             GROUP  BY anon.car) AS anon_2
         ON anon_2.end_of_year_id = t2.id

       JOIN (SELECT Min(anon.id) AS start_of_last_year_id
             FROM   car_data AS anon
             WHERE  anon.car = 'car_1'
                    AND anon.date >= '2021-01-01'
                     OR anon.car = 'car_2'
                        AND anon.date >= '2021-01-01'
                     OR anon.car = 'car_3'
                        AND anon.date >= '2021-01-01'
             GROUP  BY anon.car) AS anon_3
         ON anon_3.start_of_last_year_id = t3.id

    WHERE  t1.car = t2.car
       AND t1.car = t3.car

Please see the fiddle here which can be used to test solutions.

Update: I received the following solution which is compatible with MySQL v8.0. It uses CTE and LAG which are not supported in MySQL v5.7. I'm including it because I like the strategy of using the maximum and minimum mileage for each year. It may serve as good starting point for a solution that works on MySQL v5.7. There's a working fiddle of this solution (for MySQL v8.0) at this link

  with cte1 as (
   -- CTE1 - get every combination of car and year to ensure no gaps
   select D1.car, year(D2.date) year
    from car_data D1
    cross join car_data D2
    group by D1.car, year(D2.date)
),
cte2 as (
    -- CTE2 - join on the actual data and get the mileage at the start and end of the year
    select C.car, year, min(mileage) start, max(mileage) end
    from cte1 C
    left join car_data D on D.car = C.car and year(D.date) = C.year
    group by C.car, year
),
cte3 as (
    -- CTE3 use lag to get the mileage at the end of last year as the start data for this year where it exists
    select car, year
      , lag (end,1,start) over (partition by car order by car, year) start
      , end
    from cte2
)

select car
  , sum(case when year = 2021 then end-start else null end) 2021_year_mileage
  , sum(case when year = 2022 then end-start else null end) 2022_year_mileage
from cte3
where car='car_1' or car='car_2' or car='car_3'
group by car
order by car;

davidsbro
  • 2,761
  • 4
  • 23
  • 33
STEM FabLab
  • 380
  • 3
  • 13
  • I also had an issue with the fiddle on http://sqlfiddle.com/. The new fiddle is on https://www.db-fiddle.com/. Hopefully that works. Also, I'm using MySQL version 5.7. – STEM FabLab Feb 16 '22 at 16:56

0 Answers0