I'm lookin for help on a question very closely related to optimized rolling functions on irregular time series with time-based window.
I want to calculate my function (in my example below it is sum, but could just as easily be min or max or median...) on the timesteps up to a certain number of minutes ago, but not beyond this number. So if my duration of interest is 8 mins and I have timesteps of 2,2,5 for the third time I only want to sum the values for the last 2 numbers (which gives a total time of 7) not including the first which would give a total time of 9. So basically I want as close to my duration as possible without going over.
For context the values are rainfall data and mindiff is the recording interval. So if I have a 5 min timestep the rain could have occurred at any time in that 5 mins. I want to find only the rain I can guarantee has fallen in the last 8 mins, so any time step that starts before 8 mins ago I cannot be sure and I want to exclude.
The code below is SO close to what I want (adjusted from Uwe's answer to question mentioned above) but it includes one point before unless the interval precisely equals my duration. I'd also like it to go to NA if the timestep is greater than the duration, but that's very easy to implement afterwards.
Example with desired output at the end:
library(tibble)
library(data.table)
set.seed(1)
testdf <- tibble(mindiff = c(1,2,1,2,5,2,2,2,11,5,3,2,0,1),
DateTime = as.POSIXct("2019-01-01 00:00", z = "Etc/GMT-10")+(cumsum(mindiff)*60),
Value = round(runif(14,0,5),1))
dur8= 60*8
testdf[["Total8Min"]] <- setDT(testdf)[, rn := .I][.(ur = rn, ud = DateTime, ld = DateTime - dur8),
on = .(rn <= ur, DateTime <= ud, DateTime > ld),
sum(as.double(Value)), by = .EACHI]$V1
testdf$desiredOut <- c(1.3,3.2,6.1,10.6,8.4,5.5,9.2,12.5,NA, 0.3,1.3,1.9,5.3,7.2)
Created on 2019-08-21 by the reprex package (v0.3.0)
I am a bit of a novice at data.table
, I'm much more used to tidyverse
so happy for a tidyverse
approach if someone has one.
To explain further in response to comments: In my desired output on row 6 I only want the output to be 5.5 which is what has fallen in the last 2 mins (from 00:11 to 00:13) and in the 5 mins before that (from 00:06 to 00:11). I do not want to include the value in row 4 as this rain could have fallen any time from 00:04 to 00:06 - potentially all prior to 00:05 which is 8 mins before my "now" time on row 6.