I have a data.table
with time series and am trying to compute several aggregations on overlapping time intervals, e.g. in February I would like to have the mean on data from January and February, in March - February and March and so on.
I was able to calculate this using a for loop, but as my data.table
contains over 300 000 rows and several variables, I was wondering if there is a more effective/elegant way to achieve this. I tried to use rollapply
from the zoo
package in various ways but did not get the expected result.
library(data.table)
library(zoo)
# sample data
dt <- data.table(day = Sys.Date() - 100:1, var = 1:100)
dt[, month := month(day)]
# by 1 month is pretty obvious
dt[, mean(var), by = month]
month V1
1: 7 1.5
2: 8 18.0
3: 9 48.5
4: 10 79.0
5: 11 97.5
# by 2 months - solution using for loop = expected result
for (m in unique(dt[, month])[-1]) {
dt[month == m, res := mean(dt[month %in% c(m, m-1), var])]
}
dt[, unique(res), by = month]
month V1
1: 7 NA
2: 8 17
3: 9 33
4: 10 64
5: 11 82
# one of the things I tried
dt[, res := NULL]
lw <- dt[, .N, by = month][, N]
lw <- as.list(lw[-1] + lw[-length(lw)])
dt[, rollapplyr(var, width = lw, mean, fill = NA), by = month]