How to speed up/ improve rolling average function?

Question

My data is 988, 785 obs. of 3 variables. A smaller example of my data is below:

Names <- c("Jack", "Jill", "John")
RawAccelData <- data.frame(
  Sample = as.numeric(rep(1:60000, each = 3)),
  Acceleration = rnorm(6000),
  ID = rep((Names), each = 60000)
)

The sample rate of my equipment is 100 Hz. I wish to calculate a rolling average of Acceleration for each ID over a 1 to 10 second period. I perform this using the following:

require(dplyr)
require(zoo)

for (summaryFunction in c("mean")) {
  for ( i in seq(100, 1000, by = 100)) {
    tempColumn <- RawAccelData %>%
      group_by(ID) %>%
      transmute(rollapply(Acceleration,
                          width = i, 
                          FUN = summaryFunction, 
                          align = "right", 
                          fill = NA, 
                          na.rm = T))
    colnames(tempColumn)[2] <- paste("Rolling", summaryFunction, as.character(i), sep = ".")
    RawAccelData <- bind_cols(RawAccelData, tempColumn[2])
  }
}

However, I now need to calculate a rolling over a 1 to 10 minute period. I can do this by using the above code and substituting in the following line:

for ( i in seq(6000, 60000, by = 6000)) {

However, this takes hours to run through my dataset and results in RStudio on my Mac (details below) hanging! Is there a way I can a) tidy up the above code or b) use a different package/ method to enable a quicker result?

Thank you.

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] zoo_1.7-12  dplyr_0.4.3

loaded via a namespace (and not attached):
 [1] lazyeval_0.1.10 magrittr_1.5    R6_2.1.1        assertthat_0.1  parallel_3.2.3  DBI_0.3.1      
 [7] tools_3.2.3     Rcpp_0.12.2     grid_3.2.3      lattice_0.20-33

G. Grothendieck · Answer 1 · 2016-03-14T15:52:35.170

The reason it is running slowly is that

the code in the question has defeated rollapply's ability to detect that mean is being passed by assigning mean to a variable and passing that variable. (In the case of mean, rollapply calls rollmean which contains optimized code for that case). Had the code in the question passed mean directly or had it used rollmean it would have been substantially faster.
filter does not remove NAs so for an apples to apples comparison one should not use na.rm = TRUE in rollapply. If you do use it then it will also defeat the optimization.

For example, in this comparison rollapply runs more than twice as fast as filter:

library(zoo)
library(rbenchmark)

set.seed(123)
r <- rnorm(10000)
benchmark(filter = stats::filter(r, rep(1/100,100), sides = 1),
          rollapply = rollapplyr(r, 100, mean, fill = NA))[1:4]

giving:

       test replications elapsed relative
1    filter          100    3.75    2.119
2 rollapply          100    1.77    1.000

The speed may, of course, vary according to the width, data length and other aspects of the input since this is only one test.

I appreciate the context to why it is running slowly, this will assist me with creating code in the future. Thank you! — user2716568, Mar 15 '16 at 22:16

score 1 · Accepted Answer · edited May 23 '17 at 10:28

1

I'm not sure if you have other summary functions in mind, but at least for the mean, you can speed up the rollapply function by using filter instead: transmute(stats::filter(Acceleration,rep(1/i,i),sides=1))

(See other options here: Calculating moving average in R) Using system.time, this sped me up from 117 secs to 4 secs!!

You can also do some for loops in parallel. Instead of

for ( i in seq(6000, 60000, by = 6000)) {

try:

library(parallel)
for (summaryFunction in c("mean")) {
  rollCols = mclapply (seq(100, 1000, by = 100),function(i){
    tempColumn <- RawAccelData %>%
    group_by(ID) %>%
    transmute(stats::filter(Acceleration,rep(1/i,i),sides=1))
    colnames(tempColumn)[2] <- paste("Rolling", summaryFunction, as.character(i), sep = ".")
    return(tempColumn[2])
  })
}

RawAccelData = cbind(RawAccelData,do.call(cbind,rollCols))

This sped me up from 72 sec to 40 sec, but it depends on how many cores your computer has.

edited May 23 '17 at 10:28

Community

1
1

answered Mar 14 '16 at 01:01

user20061

444
6
12

I am only interested in the mean, so thank you for the concise and useful answer. – user2716568 Mar 14 '16 at 02:23
In your answer, you please include the full code you used to run with the for loops in `Parallel`? My result is just printing in the console, rather than appending to `RawAccelData` like the code in my question. – user2716568 Mar 14 '16 at 03:17
You're right, I missed some details -- I just edited in the body of the code. – user20061 Mar 14 '16 at 04:33
Perfect, appreciate it! This was much quicker than my initial code. – user2716568 Mar 14 '16 at 04:48

How to speed up/ improve rolling average function?

2 Answers2