-2

I have a table in MySQL 8.0 with the following structure.

CREATE TABLE `rg` ( `name` text, 
                    `zip` text,
                    `Year` bigint,
                    `r` double,
                    `geom` GEOMETRY
                  );

INSERT INTO `rg` (name, zip, Year, r, geom) 

VALUES 

('AB, NM', '87105', '2015', '10', 'POLYGON ((35.066062 -106.700005))'),

('AB, NM', '87105', '2016', '20', 'POLYGON ((35.067912 -106.700884))'),

('AB, NM', '87105', '2017', '30', 'POLYGON ((35.067912 -106.700884))')

How do I write a query that returns the percentage difference of r between the latest 2 available years? I group by name, zip and geom columns.

select name, zip, ST_AsText(geom) as geom, ....
from rg
GROUP BY name, zip, geom

Expected output:

AB, NM 87105 POLYGON ((35.067912 -106.700884)) 50 

i.e difference between r values 20, 30 of 2016 and 2017 is 50.

kms
  • 1,810
  • 1
  • 41
  • 92
  • Percentage difference of what? – Barmar Sep 05 '22 at 21:47
  • Please add what you expect the result set to be. – griv Sep 05 '22 at 21:53
  • @Barmar of values in column `r` of the latest 2 available Years. – kms Sep 05 '22 at 21:53
  • @GRIV I have updated the post with expected output. – kms Sep 05 '22 at 21:56
  • Join two subqueries, one that gets the last row in each group, the other that gets the second-to-last row in each group. Then calculate the percentage difference between `r` from each subquery. – Barmar Sep 05 '22 at 21:56
  • 1
    See https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql?rq=1 for how to get the last row, and https://stackoverflow.com/questions/70436765/how-can-i-select-the-second-to-last-rows-in-a-mysql-table-grouped-by-column for how to get the 2nd-to-last row. – Barmar Sep 05 '22 at 21:57
  • @Barmar are you able to write an answer ? – kms Sep 05 '22 at 22:01
  • 1
    You should make some attempt to solve it yourself first. – Barmar Sep 05 '22 at 22:02

2 Answers2

1

Use two CTE's, one to get the previous max year, and the other for the max year. JOIN them together using name and zip, and run the difference calculation (mentioned as your expected output above) to get the percentage change between latest two available years.

Calculation:

((maxYear.r-previousMaxYear.r)/previousMaxYear.r * 100)

WITH 
previousMaxYear (name, zip, geom, r) AS (
SELECT name, zip, geom, r FROM rg WHERE year = (SELECT MAX(year) - 1 FROM rg LIMIT 1)
),
maxYear (name, zip, geom, r) AS (
SELECT name, zip, geom, r FROM rg WHERE year = (SELECT MAX(year) FROM rg LIMIT 1)
)
SELECT a.name, a.zip, ST_AsText(a.geom) as geom, CONCAT(ROUND((a.r-b.r)/b.r * 100), '%') AS 'Difference'
FROM maxYear a
  INNER JOIN previousMaxYear b ON
  a.name = b.name
  AND 
  a.zip = b.zip

Result:

name zip geom Difference
AB, NM 87105 POINT(35.067912 -106.700884) 50%

db<>fiddle here.

UPDATE:

Since, as you stated, the previous year may not always be MAX(year) - 1, you can use one CTE to partition it by year DESC in order to get the sequential number for each row.

Then, JOIN the CTE with itself (self join):

  • JOIN the max date (alias a) WHERE row number = 1 (RN=1).
  • JOIN the next max date (alias b) WHERE row number = 2 (RN=2).

Finally, LIMIT 1 row to return the values from alias a.

WITH CTE AS (SELECT name, zip, geom, r, ROW_NUMBER() OVER (ORDER BY year DESC) AS RN FROM rg)
SELECT
  a.name, 
  a.zip, 
  ST_AsText(a.geom) AS geom,
  CONCAT(ROUND((a.r-b.r)/b.r * 100), '%') AS 'difference'
FROM CTE
INNER JOIN (CTE AS a)
    ON (a.name = CTE.name 
        AND a.zip = CTE.zip 
        AND a.RN = 1)
INNER JOIN (CTE as b)
    ON (b.name = CTE.name 
        AND b.zip = CTE.zip 
        AND b.RN = 2)
LIMIT 1

db<>fiddle here.

griv
  • 2,098
  • 2
  • 12
  • 15
  • This makes sense, one case to consider here is that `previousMaxYear` may not always be `Max(year) - 1`. i.e `Max(year)` could be `2017` and previous available year could be `2015`. Would the query work for such cases? – kms Sep 05 '22 at 23:20
  • **No** it will not work, but I've updated my answer to include a solution that will account for the year not being `MAX(year) - 1`. – griv Sep 06 '22 at 00:07
0

Your math is a bit off

vut the query would look like, but 50 i can't get

The window function LEAD gives you the next value in Line. you can use it to make mathematical operation

Row_number gives you the row number also for the parttiion and order given in the window function.

As you are only interested in the last two years, we sort the rows by the YEAR and want the partition of name and zip

The calculation of the percentage is

r / r(from the last year) * 100

To get your 50 pro cent growth you need to subtract 100 from the actual per centage

r / r(from the last year) * 100 - 100

OR

r(from the last eyxr) / r * 100

But as i can't get 50 per cent i added both calcculations, select the one you want


WITH CTE AS (SELECT
name, zip, Year, r, geom,
   LEAD(r) OVER (PARTITION BY name, zip ORDER BY YEAR DESC) / r * 100 as lasttwoyears
  , ROW_NUMBER() OVER (PARTITION BY name, zip ORDER BY YEAR DESC) rn
FROM rg)
SELECT name, zip,geom, lasttwoyears  FROM CTE WHERe rn = 1
name zip geom lasttwoyears
AB, NM 87105 000000000101000000c3802557b18841408a929048dbac5ac0 66.66666666666666

WITH CTE AS (SELECT
name, zip, Year, r, geom,
   r / LEAD(r) OVER (PARTITION BY name, zip ORDER BY YEAR DESC)  * 100 as lasttwoyears
  , ROW_NUMBER() OVER (PARTITION BY name, zip ORDER BY YEAR DESC) rn
FROM rg)
SELECT name, zip,geom, lasttwoyears  FROM CTE WHERe rn = 1
name zip geom lasttwoyears
AB, NM 87105 000000000101000000c3802557b18841408a929048dbac5ac0 150

WITH CTE AS (SELECT
name, zip, Year, r, geom,
   r / LEAD(r) OVER (PARTITION BY name, zip ORDER BY YEAR DESC)  * 100 - 100 as lasttwoyears
  , ROW_NUMBER() OVER (PARTITION BY name, zip ORDER BY YEAR DESC) rn
FROM rg)
SELECT name, zip,geom, lasttwoyears  FROM CTE WHERe rn = 1
name zip geom lasttwoyears
AB, NM 87105 000000000101000000c3802557b18841408a929048dbac5ac0 50

fiddle

nbk
  • 45,398
  • 8
  • 30
  • 47
  • can you add a bit of explanation for how the query works? – kms Sep 05 '22 at 22:11
  • the math is simple and the function self eyplanatory – nbk Sep 05 '22 at 22:23
  • the partition bys should include geom, from the question – ysth Sep 06 '22 at 01:01
  • 1
    @kms it uses the LEAD() and ROW_NUMBER() window functions. ROW_NUMBER provides a ranking of all rows with the same column values given in the partition by clause, in the order given by the order by clause. so the row with the highest year for a given name,zip will have rn 1; later, the results will be filtered to just those rows. LEAD gives the value for the given expression from the following row with the same name,zip and in order by year desc, so for the row with rn=1, LEAD will give the r value from the next lower year – ysth Sep 06 '22 at 01:07
  • 1
    it then calculates the percentage the later year is of the earlier year, so for 30 and 20, 150%. it sounds like you only want the increase (50%) so you would subtract 100 `r / LEAD(r) OVER (PARTITION BY name, zip, geom ORDER BY YEAR DESC) * 100 - 100` – ysth Sep 06 '22 at 01:09
  • @ysth adding geom to the partition by, brings a problem as the sorting order gets lost – nbk Sep 06 '22 at 10:04
  • @nbk not sure what you mean; they want only one row for each distinct (name, zip, geom), and don't specify any order. and want to compare the most recent two years for each (name, zip, geom) – ysth Sep 06 '22 at 10:47
  • @ysth use my fiddle and add geom tothe partiton by to see – nbk Sep 06 '22 at 10:54
  • @nbk added: https://dbfiddle.uk/x7DCAtC8 (also added the math I think they want: percentage *increase*, so - 100) – ysth Sep 06 '22 at 11:02