4

I have a simple data.frame where I want to compute some summary statistics on a rolling basis. For example, a rolling median over a window of five observations (2 lags, current one and 2 ahead) is achieved by

library(dplyr)
x <- data.frame("vals" = rnorm(3e04))
y <- x %>%
         mutate(med5 = rollapply(data = vals, 
                width = 5, 
                FUN = median, 
                align = "center", 
                fill = NA, 
                na.rm = TRUE))

However, I would like to exclude the current row from this computation. I found the following approach:

z <- x %>% 
      mutate(N=1:n()) %>% 
      do(data.frame(., prmed = sapply(.$N, function(i) median(.$vals[.$N %in% c((i - 2):(i - 1), (i + 1):(i + 2))]))))

This does what I want, if I subsequently set the first two values to NA.

So far so good, the only problem is that the latter approach is terribly slow compared to rollapply.

Is there a way to achieve the outcome of the latter with the speed of the former?

Akkariz
  • 139
  • 8

2 Answers2

5

A solution based on excluding the third number of the five, which is the current row of the calculation.

library(dplyr)
library(zoo)

set.seed(124)

x <- data.frame("vals" = rnorm(3e04))
y <- x %>%
  mutate(med5 = rollapply(data = vals, 
                          width = 5, 
                          FUN = function(x) median(x[-3], na.rm = TRUE), 
                          align = "center", 
                          fill = NA))

head(y)
#          vals      med5
# 1 -1.38507062        NA
# 2  0.03832318        NA
# 3 -0.76303016 0.1253147
# 4  0.21230614 0.3914015
# 5  1.42553797 0.4562678
# 6  0.74447982 0.4562678
www
  • 38,575
  • 12
  • 48
  • 84
  • 2
    Works like a charm, thanks! Elegant, straightforward and easily generalisable to functions other than the median. – Akkariz Dec 07 '17 at 17:28
4

The width= argument of rollapply can be a one element list containing a vector of offsets.

y <- x %>%
  mutate(med5 = rollapply(data = vals, 
                          width = list(c(-2, -1, 1, 2)),
                          FUN = median,
                          na.rm = TRUE,
                          fill = NA))

Note that align = "center" is the default and so does not have to specified. In addition, if we use offsets then align= is ignored. For safety, TRUE should be written out in full since T can also be a variable name.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341