3

I can implement a rolling window by repeatedly 'shifting' my data, and then summarising 'row-wise', but this seems cumbersome and not easily generalisable to different window sizes.

#' Generate dummy data
library(data.table)
set.seed(42)
d <- data.table(id=rep(letters[1:2], each=5), time=rep(1:5,times=2), x=sample.int(10,10,replace=T))

The data looks like this:

id  time    x
a   1   10
a   2   10
a   3   3
a   4   9
a   5   7
b   1   6
b   2   8
b   3   2
b   4   7
b   5   8

Now take a rolling 'maximum' over the last 2 times (for each id).

#' Now you want to take the maximum of the previous 2 x values (by id)
#' I can do this by creating shifted lagged versions
d[, x.L1 := shift(x,1,type='lag'), by=id]
d[, x.L2 := shift(x,2,type='lag'), by=id]
d[, x.roll.max := max(x,x.L1,x.L2, na.rm=2), by=.(id,time)]

Generates this

id  time    x   x.L1    x.L2    x.roll.max
a   1   10  NA  NA  10
a   2   10  10  NA  10
a   3   3   10  10  10
a   4   9   3   10  10
a   5   7   9   3   9
b   1   6   NA  NA  6
b   2   8   6   NA  8
b   3   2   8   6   8
b   4   7   2   8   8
b   5   8   7   2   8

I am assuming there is a much better way.

drstevok
  • 715
  • 1
  • 6
  • 15
  • Perhaps look into the various `roll*` functions in packages **zoo** and **RcppRoll**. – Josh O'Brien Feb 09 '17 at 00:27
  • @JoshO'Brien: sorry, fixed data. I'd seen posts using zoo and RcppRoll but sort of thought this would be the sort of thing that should work well in data.table. – drstevok Feb 09 '17 at 00:30
  • 1
    Your code simplifies to `d[, do.call(pmax, c(shift(x, 0:2, type='lag'), na.rm=TRUE)), by=id]`, but I guess this is still less efficient than a specialized roller like RcppRoll. – Frank Feb 09 '17 at 00:55
  • 1
    @Frank: thanks - implemented your latter suggestion below – drstevok Feb 09 '17 at 09:41

3 Answers3

3

So I followed @Franks suggestiong above and went to RcppRoll.

library(Rcpp)
d[, x.roll.max := roll_max(x, n=2L, align='right', fill=NA, na.rm=T), by=id]

And I guess I shouldn't have been trying to do it all in data.table b/c this works very nicely.

  id    time    x   x.roll.max
a   1   11  NA
a   2   12  12
a   3   4   12
a   4   10  10
a   5   8   10
a   6   7   8
b   1   9   NA
b   2   2   9
b   3   8   8
b   4   9   9
b   5   6   9
b   6   9   9
drstevok
  • 715
  • 1
  • 6
  • 15
  • Note called `library(Rcpp)` first because I was getting the following error `function 'enterRNGScope' not provided by package 'Rcpp'` which I assumed meant that the function from `Rcpp` was being masked by ?data.table (see http://stackoverflow.com/questions/21657575/what-does-this-mean-in-lme4-function-dataptr-not-provided-by-package-rcpp#23020525) – drstevok Feb 09 '17 at 09:54
3

As of data.table v1.12.4 (03 Oct 2019) the function frollapply for rolling computation of arbitrary R functions is available:

library(data.table)

set.seed(42)
d <- data.table(id = rep(letters[1:2], each = 5), time = rep(1:5, times = 2), x = sample.int(10, 10, replace = T))
d[, x.roll.max := frollapply(x = x, n = 2, max, fill = NA, align = "right", na.rm = TRUE), by = id]

    id time  x x.roll.max
 1:  a    1  1         NA
 2:  a    2  5          5
 3:  a    3  1          5
 4:  a    4  9          9
 5:  a    5 10         10
 6:  b    1  4         NA
 7:  b    2  2          4
 8:  b    3 10         10
 9:  b    4  1         10
10:  b    5  8          8
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
1

I like Ulrich's TTR package. The below gives you a running max

TTR::runMax(d$x,2)
greengrass62
  • 968
  • 7
  • 19
  • Nice tip about TTR which works nicely, but (unfortunately) doesn't handle missing values (no `na.rm=T` option) – drstevok Feb 09 '17 at 09:22