3

i have a sliding window problem. specifically, i do not know where my window should start and where it should end. i do know the size of my interval/window.

i need to find the start/end of the window that delivers the best (or worst, depending on how you look at it) case scenario.

here is an example dataset:

 value  | tstamp
 100    | 2013-02-20 00:01:00
 200    | 2013-02-20 00:02:00
 300    | 2013-02-20 00:03:00
 400    | 2013-02-20 00:04:00
 500    | 2013-02-20 00:05:00
 600    | 2013-02-20 00:06:00
 500    | 2013-02-20 00:07:00
 400    | 2013-02-20 00:08:00
 300    | 2013-02-20 00:09:00
 200    | 2013-02-20 00:10:00
 100    | 2013-02-20 00:11:00

let's say i know that my interval needs to be 5 minutes. so, i need to know the value and timestamps included in the 5 minute interval where the sum of 'value' is the highest. in my above example, the rows from '2013-02-20 00:04:00' to '2013-02-20 00:08:00' would give me a sum of 400+500+600+500+400 = 2400, which is the highest value over 5 minutes in that table.

i'm not opposed to using multiple tables if needed. but i'm trying to find a "best case scenario" interval. results can go either way, as long as they net the interval. if i get all data points over that interval, it still works. if i get the start and end points, i can use those as well.

i've found several sliding window problems for SQL, but haven't found any where the window size is the known factor, and the starting point is unknown.

MT0
  • 143,790
  • 11
  • 59
  • 117
jasonmclose
  • 1,667
  • 4
  • 22
  • 38

1 Answers1

13
SELECT  *,
        (
        SELECT  SUM(value)
        FROM    mytable mi
        WHERE   mi.tstamp BETWEEN m.tstamp - '5 minute'::INTERVAL AND m.tstamp
        ) AS maxvalue
FROM    mytable m
ORDER BY
        maxvalue DESC
LIMIT   1

In PostgreSQL 11 and above:

SELECT  SUM(value) OVER (ORDER BY tstamp RANGE '5 minute' PRECEDING) AS maxvalue,
        *
FROM    mytable m
ORDER BY
        maxvalue DESC
LIMIT   1
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • i tried it, and i got a result, but i have no clue what i'm looking at. where did you create your interval based upon the timestamps? i'm having a little trouble understanding the query. – jasonmclose Feb 20 '13 at 21:32
  • @jasonmclose: sorry, got your request wrong, I thought you needed 5 records, not 5 minutes. Please see the new query. – Quassnoi Feb 20 '13 at 21:35
  • 1
    this is it. this does exactly what i needed it to do. thank you so much for your help. – jasonmclose Feb 20 '13 at 22:09
  • This query has two flaws: it only works because the timestamps in the example dataset are always 1 minute apart, and its running time is quadratic. – Changaco Jul 22 '21 at 08:23
  • @Changaco: if the timestamp field is indexed, its running time is linearithmic, not quadratic. The query works regardless of the interval between the events. – Quassnoi Jul 24 '21 at 18:38
  • @Quassnoi No, it doesn't work regardless of the interval. For example if two of the timestamps are 3 minutes apart, then your query won't sum them. – Changaco Jul 26 '21 at 08:50
  • The solution I ended up using looks like this: `SELECT sum(value) OVER (ORDER BY tstamp RANGE '5 minutes' PRECEDING) FROM mytable`. – Changaco Jul 26 '21 at 08:56
  • @changaco: yes you're right, I'll fix it. Your query is good, and had PostgreSQL supported it back then, it would make a great answer. – Quassnoi Jul 26 '21 at 20:06