0

I'm trying to do a moving average (similar to roll_mean in RcppRoll) except that, at each window, I'd like to trim the outliers (e.g. just take the 5th-95th percentile of values).

As an example, given a rolling window of

v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

And given that I'd like the 10th-90th percentile of values, I should get an answer of 5.5 (1 and 10 will be excluded, and a mean is taken on the rest of the values (2 to 9).

I'm unfortunately not able to use functions like RcppRoll::roll_mean for this, since the trimming needs to be done at each rolling window.

I was able to do this by feeding a custom mean function to zoo::rollapply - but it's simply working too slowly for my use case (>1e6 rows).

I've looked at various packages that support rolling functions (e.g. RcppRoll, zoo, TTR, caTools, roll etc.) but none of them seem to support this trimming functionality.

I'm thinking of resorting to using Rcpp to build a custom, fast roll function, but am relatively unfamiliar with that framework. Not sure if there are better solutions.

Any help will be appreciated here.

  • 2
    Welcome to SO! Can you Please give a [mcve]? Read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ... then edit your question: http://stackoverflow.com/posts/43666402/edit – jogo Apr 27 '17 at 19:48

2 Answers2

-1

I suppose you could do something like

rollapply(data, 10, function(x) mean(x[x>=quantile(x,0.1) & x<=quantile(x,0.9)]))
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • I tried doing that but the zoo functions like rollapply are terribly slow for larger data sets - so will likely need something implemented using Rcpp – caloriefree987 Apr 27 '17 at 19:48
  • Have you tried the `trim` option in `mean` - e.g. `rollapply(data,10,mean,trim=0.1)`? Might be a little faster. – Andrew Gustar Apr 27 '17 at 20:01
  • This answer suggests that it might be faster in base-R than using zoo::rollapply... http://stackoverflow.com/questions/30090336/why-is-zoorollmean-slow-compared-to-a-simple-rcpp-implementation – Andrew Gustar Apr 27 '17 at 20:05
-1

Here is a function in base-R that is a lot faster than zoo::rollapply. It is probably possible to streamline it further, but the principle seems to work. It avoids the sort on each window by using a 'rolling' sorted vector vec and just updating it for the old and new elements as the window rolls.

require(zoo) #just for comparison at the end
require(microbenchmark)
data <- sample(1:100,1000,TRUE)

rollMeanTrim <- function(dat,window,trim){
   n <- length(dat)-window+1
   out <- rep(NA,n)
   exc <- round(trim*window)
   vec <- sort(dat[1:window])
   out[1] <- mean(vec[(1+exc):(window-exc)])
   for(i in 2:n){
      old <- dat[i-1]
      new <- dat[i+window-1]
      oldpos <- match(old,vec)
      vec <- vec[-oldpos]
      newpos <- match(1,sign(vec-new))
      if(is.na(newpos)) {
         vec <- c(vec,new)
      } else if(newpos==1) {
         vec <- c(new,vec)
      } else {
         vec <- c(vec[1:(newpos-1)],new,vec[newpos:(window-1)])
      }
      out[i] <- mean(vec[(1+exc):(window-exc)])
   }
   return(out)
}

microbenchmark(rollMeanTrim(data,10,0.1),rollapply(data, 10, mean, trim=0.1))

Unit: milliseconds
                                  expr      min       lq     mean   median       uq      max neval
           rollMeanTrim(data, 10, 0.1)  63.4825  81.2573 149.4777  98.8031 146.4868 1163.929   100
 rollapply(data, 10, mean, trim = 0.1) 213.8742 330.3273 659.2942 412.7529 773.4881 2761.591   100
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • thanks! Great to see that even base-R can provide a leg up over zoo for this case. I ended up needing a 2-3 order magnitude speedup, though, so ended up going the Rcpp route – caloriefree987 May 04 '17 at 20:31
  • Yes, I was surprised, but I suppose a specific routine is often going to beat a generalised one. How much faster did the Rcpp solution turn out to be? – Andrew Gustar May 04 '17 at 21:43
  • i got a ~100x speedup via Rcpp on about 1e6 rows – caloriefree987 May 05 '17 at 17:11