3

I am trying to create a SQL query that will pull the number of rows since the last maximum value within a windows function over the last 5 rows. In the example below it would return 2 for row 8. The max value is 12 which is 2 rows from row 8.

For row 6 it would return 5 because the max value of 7 is 5 rows away.

|ID  | Date       | Amount  
| 1  | 1/1/2019   |  7  
| 2  | 1/2/2019   |  3  
| 3  | 1/3/2019   |  4  
| 4  | 1/4/2019   |  1  
| 5  | 1/5/2019   |  1  
| 6  | 1/6/2019   |  12  
| 7  | 1/7/2019   |  2  
| 8  | 1/8/2019   |  4  

I tried the following:

SELECT ID, date, MAX(amount) 
OVER (ORDER BY date ASC ROWS 5 PRECEDING) mymax
FROM tbl 

This gets me to the max values but I am unable to efficiently determine how many rows away it is. I was able to get close using multiple variables within the SELECT but this did not seem efficient or scalable.

Dale K
  • 25,246
  • 15
  • 42
  • 71
JDG
  • 89
  • 1
  • 9

2 Answers2

2

You can calculate the cumulative maximum and then use row_number() on that.

So:

select t.*,
       row_number() over (partition by running_max order by date) as rows_since_last_max
from (select t.*, 
             max(amount) over (order by date rows between 5 preceding and current row) as running_max
      from tbl t
     ) t;

I think this works for your sample data. It might not work if you have duplicates.

In that case, you can use date arithmetic:

select t.*,
       datediff(day,
                max(date) over (partition by running_max order by date),
                date
               ) as days_since_most_recent_max5
from (select t.*, 
             max(amount) over (order by date rows between 5 preceding and current row) as running_max
      from tbl t
     ) t;

EDIT:

Here is an example using row number:

select t.*,
       (seqnum - max(case when amount = running_amount then seqnum end) over (partition by running_max order by date)) as rows_since_most_recent_max5
from (select t.*, 
             max(amount) over (order by date rows between 5 preceding and current row) as running_max,
             row_number() over (order by date) as seqnum
      from tbl t
     ) t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for your reply, but I am still running into an issue. My "amounts" do have duplicate numbers which is causing issues. I then attempted the second example you provided, however there are gaps in the dates of my source data. This then yields inaccurate results. I really need the number of rows to the most recent max and not datediff. Any additional help would be greatly appreciated. – JDG Aug 06 '19 at 13:45
  • @JDG . . . You explicitly state that you want the previous five rows, so gaps in the date should not matter. If they are an issue, I would suggest that you ask a *new* question with appropriate sample data, desired results, and an explanation. – Gordon Linoff Aug 06 '19 at 14:19
  • Hi Gordon. That is correct, it should be the max from the previous 5 rows. The datediff funtion is counting the difference between dates and not the difference in rows. For example the dates could be in the sequence 10/1/2019, 10/2/2019, 10/5/2019, 10/10/2019. The datedff would provide a "days_since_most_recent_max5" that does not match the difference in rows. – JDG Aug 06 '19 at 15:04
  • @JDG . . . In that case, you can use `id` rather than `date` -- if your ids are gapless and always increment by 1. – Gordon Linoff Aug 06 '19 at 15:33
  • Yes, so thats what I originally tried in your first example. The issue is that there are duplicate "running_max" in the first part of the query. When the second part of the query looks for the row_number, it is selecting the incorrect record. In many cases the rows_since_last_max is returning values over 5. This is because its finding a duplicate running_max before the correct instance. – JDG Aug 06 '19 at 15:48
  • @ Gordon Linoff Please don't leave me hangin :) – JDG Aug 09 '19 at 20:18
  • @ Gordon Unfortunately this is not working as *rows_since_most_recent_max5* is always yielding zero. – JDG Aug 13 '19 at 14:40
  • @JDG . . . There should have been some conditionality there. I revised the edited answer. – Gordon Linoff Aug 13 '19 at 15:35
  • That worked, thanks! Just needed to add a closing parenthesis after `running_amount then seqnum end)` and change *running_amount* to *running_max* – JDG Aug 14 '19 at 19:29
0

It would be :

select *,ID- 
(
SELECT ID
FROM
  (
    SELECT
      ID,amount,
      Maxamount =q.mymax
    FROM
      Table_4 
  ) AS derived
WHERE
  amount = Maxamount
) as result
from (

    SELECT ID, date,
    MAX(amount) 
    OVER (ORDER BY date ASC ROWS 5 PRECEDING) mymax
    FROM Table_4 
)as q
Roxana Sh
  • 294
  • 1
  • 3
  • 14