0

I have the following data:

y <- data.table(cbind(week = rep(1:61,5352),
ID = rep(1:5352, each = 61), w = runif(326472), v = runif(326472)))
y$v[sample(1:326472, 10000, replace=FALSE)] <- NA

for which I'm running the code bellow that creates a rolling mean of variable v, ignoring outliers and NAs. The code is working, but with poor perfomance. I'm sure there is more efficient way to run it using apply or something similar, but I've been unsuccessful in creating a faster version. Can anyone shed some light on how to make it more efficient?

IDs <- unique(y$ID)
y$vol_m12 <- 0

for (i in 1:length(IDs)) {
  x <- y[ID==IDs[i]]

  outlier <- 0.2
  w_outlier <- quantile(x$w, c(outlier), na.rm = T)
  v_outlier <-quantile(x$v, c(1 - outlier), na.rm = T)

# Ignore outliers      
  x$v_temp <- x$v
  x$v_temp[((x$v_temp >= v_outlier)
                 & (x$w <= w_outlier))] <- NA

# Creating rolling mean
  y$vol_m12[y$ID==IDs[i]] <- x[, rollapplyr(v_temp, 12, (mean), fill = NA, na.rm=T)]
}
user1482923
  • 211
  • 3
  • 11
  • [This may be helpful](https://stackoverflow.com/questions/34754786/how-to-create-a-matrix-by-averaging-the-elements-of-another-matrix-in-r/34755233#34755233) – user5249203 Feb 05 '16 at 19:39
  • 1
    See: http://stackoverflow.com/questions/29851637/efficiently-perform-row-wise-distribution-test. Seems likely that minor mods to that Rcpp code might succeed. – IRTFM Feb 05 '16 at 19:43

1 Answers1

1

Thanks for the replies. Following 42 advice, I've produced the following code:

library(RcppRoll)
# Ignore outliers
y[, w_out := quantile(w, c(outlier), na.rm = T), by=ID]
y[, v_out := quantile(v, c(1-outlier), na.rm = T), by=ID]
y[((v <= v_out) & (w >= w_out)), v_temp := v]
y[,w_out := NULL]
y[,v_out := NULL]

y[, v_m12 := roll_mean(as.matrix(v_temp), n =12L, fill = NA,
                     align = c("right"), normalize = TRUE, na.rm = T), by = ID]

System time is about .59 seconds against 10.36 for the solution bellow, which uses rollapplyr (but probably it is possible to make the outlier removal more efficient).

y[, v_m12 :=rollapplyr(v_temp, 12, (mean), fill = NA, na.rm=T), by = ID]
user1482923
  • 211
  • 3
  • 11