1

I want to rollapply a function on a data.table. And in the function I would like to work with the data.table subset, so that the example below works.

library(zoo)
library(data.table)

dt <- data.table(i = 1:100,
                       x = sample(1:10, 100, replace = T),
                       y = sample(1:10, 100, replace = T))

rollapply(dt, width=10, FUN = function(dt_slice) dt_slice[, mean(x == y)])
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
Benni
  • 795
  • 2
  • 7
  • 20
  • 1
    Is this what you want? `dt[, rollapply(x == y, width = 10, FUN = mean)]`? – IceCreamToucan Jun 04 '19 at 13:35
  • 2
    rollapply() from package zoo? What is your expected output? – s_baldur Jun 04 '19 at 13:38
  • @IceCreamToucan although the example works it's unfortunately not that simple for my application. I really need to be able to work with a subset in FUN – Benni Jun 04 '19 at 13:48
  • How is `dt_slice` defined, please? What result do you expect if `dt_slice` has fewer rows than the width of the rolling window? – Uwe Jun 04 '19 at 14:03
  • 2
    Hopefully there will be `frollapply` soon to apply arbitrary R function over rolling window, for status see https://github.com/Rdatatable/data.table/pull/3600 – jangorecki Jun 06 '19 at 04:25

2 Answers2

3

You can use rollapply, or sapply/outer, to get a matrix of indices and then apply over that matrix with the operation you want

inds <- rollapply(seq_len(nrow(dt)), width = 10, FUN = I)
# or inds <- t(sapply(seq_len(1 + nrow(dt) - 10) - 1, `+`, 1:10))
# or inds <- outer(seq_len(1 + nrow(dt) - 10) - 1, 1:10, `+`)
# or inds <- embed(1:100, 10)[, 10:1] # thanks @Frank
apply(inds, 1, function(i) dt[i, mean(x == y)])

#  [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
# [20] 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
# [39] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1
# [58] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0
# [77] 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0

Although if the operation is as simple as this example you can also do

dt[, rollapply(x == y, width = 10, FUN = mean)]
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • 3
    More data.tableish, maybe: `m = melt(inds); dt[m$value, .(mean(x == y)), by=m$Var1]`. Also `embed` from base R can be used to make `inds`, I guess – Frank Jun 04 '19 at 14:51
1

Thanks to @jangorecki for referring to the frollapply function. It is another piece of beauty added to the data.table library. For your question, you would run the following:

library(data.table)
set.seed(17)
dt <- data.table(i = 1:100,
             x = sample(1:10, 100, replace = T),
             y = sample(1:10, 100, replace = T))
dt$index <- dt$x == dt$y
dt[,`:=` (MA = frollapply(index,10,mean)), ]
head(dt,12)