1

I have a data.table with time series and am trying to compute several aggregations on overlapping time intervals, e.g. in February I would like to have the mean on data from January and February, in March - February and March and so on.

I was able to calculate this using a for loop, but as my data.table contains over 300 000 rows and several variables, I was wondering if there is a more effective/elegant way to achieve this. I tried to use rollapply from the zoo package in various ways but did not get the expected result.

library(data.table)
library(zoo)

# sample data 
dt <- data.table(day = Sys.Date() - 100:1, var = 1:100)
dt[, month := month(day)]

# by 1 month is pretty obvious 
dt[, mean(var), by = month]
   month   V1
1:     7  1.5
2:     8 18.0
3:     9 48.5
4:    10 79.0
5:    11 97.5

# by 2 months - solution using for loop = expected result
for (m in unique(dt[, month])[-1]) {
    dt[month == m, res := mean(dt[month %in% c(m, m-1), var])]
}
dt[, unique(res), by = month]
   month V1
1:     7 NA
2:     8 17
3:     9 33
4:    10 64
5:    11 82

# one of the things I tried
dt[, res := NULL]
lw <- dt[, .N, by = month][, N]
lw <- as.list(lw[-1] + lw[-length(lw)])
dt[, rollapplyr(var, width = lw, mean, fill = NA), by = month]
MRau
  • 336
  • 1
  • 8
  • Rolling functions are not yet implemented in `data.table` ([open issue](https://github.com/Rdatatable/data.table/issues/2778)). See also [R data.table sliding window](https://stackoverflow.com/a/12214362/1851712) and links therein and links therein...Posts on [`RcppRoll`](https://stackoverflow.com/search?tab=votes&q=%5br%5d%20rcpproll) may be useful. – Henrik Nov 07 '18 at 11:37
  • by= does not take a function in rollapply. It must be a number. Also if you specify a list for the width= then it is recycled to the length of the data and the elements are regarded as offsets which is not likely what you want. – G. Grothendieck Nov 07 '18 at 14:54
  • you might want to look into RcppRoll [link](https://cran.r-project.org/web/packages/RcppRoll/RcppRoll.pdf) and specifically into the function "roll_mean" – Hjalmar Nov 07 '18 at 19:39
  • Thank you for your comments, but `RcppRoll` does not solve the issue. It seems that the `data.table` open issue is about implementing rolling functions in general so that `zoo` won't be needed, but I doubt than it would cover my problem. – MRau Nov 08 '18 at 07:34

0 Answers0