2

I want to use roll apply like functionality on non time series data but computed on a rolling window. So there is no need to convert it into zoo object and back again. Is there a way this can be done on a very large data set?

Edit

I am using

rollapply(zoo(SPYTS[, "Close"]), 2, function(x) x[1] + x[2], fill=0, align="right") 

on 1 million data points. This takes never seams to stop calculating. Something like

SPYTS$LnReturns <- (rbind(0, as.data.frame(log(SPYTS[1:(nrow(SPYTS) - 1), "Close"] / SPYTS[2:nrow(SPYTS), "Close"])))) 

just takes a few seconds.

The function function(x) x[1] + x[2] is just a place holder. The actual function I have in mind is slightly different.

  • [This](http://stackoverflow.com/q/7225992/324364) question might provide a good starting point. – joran Nov 25 '11 at 20:59
  • Just so you know, cross-posting like this is generally discouraged. If you choose the "wrong" site, it can be automatically migrated to the appropriate spot. In this case, SO is probably the better choice. – joran Nov 25 '11 at 21:02
  • I am using `rollapply(zoo(SPYTS[, "Close"]), 2, function(x) x[1] + x[2], fill=0, align="right")` on 1 million data points. This takes never seams to stop calculating. Something like `SPYTS$LnReturns <- (rbind(0, as.data.frame(log(SPYTS[1:(nrow(SPYTS) - 1), "Close"] / SPYTS[2:nrow(SPYTS), "Close"]))))` just takes a few seconds. – Suminda Sirinath S. Dharmasena Nov 25 '11 at 21:35
  • BTW `function(x) x[1] + x[2]` is just a place holder for testing the line of code. The actual function I have in mind is slightly different. – Suminda Sirinath S. Dharmasena Nov 25 '11 at 21:37
  • I am using the latest. I updated my packages. – Suminda Sirinath S. Dharmasena Nov 26 '11 at 04:52
  • This discussion is getting a bit long so I have deleted my comments and transferred them to an answer. – G. Grothendieck Nov 26 '11 at 12:54

1 Answers1

5

This answer is an expanded version of my earlier comments which I have now deleted.

zoo's rollapply already supports plain vectors and matrices. Furthermore its rollapply routine extracts the plain vectors or matrices from a zoo object before operating on it so there is no reason for a zoo object to take materially longer than a non-zoo object. The slowness you observed was a bug in rollapply (the extraction was not taking place properly) that was fixed in early November in the development version. This version is on R-Forge and installed like this:

install.packages("zoo", repo = "http://r-forge.r-project.org")

On the other hand, the generality of rollapply means its going to be much slower than special purpose routines or vectorized operations.

zoo does have some specialized versions of rollapply (rollmean, rollmedian, rollmax) that are optimized for particular operations and will be much faster. If you can manufacture something out of those, e.g. a rolling sum of k terms is the same as k times a rolling mean, then you can get substantial speedups. Faster still will be manufacturing the rolling result from plain operations such as + .

The post indicated that the function in question was just an example but the particular function could make a big difference in terms of speed since it will affect whether the sorts of speedups discussed are available.

For example, running 3 replications of each of rollapply, 2 * rollmean and a simple vectorized addition shows this:

> library(zoo)
> library(rbenchmark)
> n <- 10^4
> set.seed(123)
> a <- rnorm(n)
> library(rbenchmark)
> benchmark(rollapply = a1 <- rollapplyr(a, 2, sum, fill = 0),
+    rollmean = a2 <- 2 * rollmeanr(a, 2, fill = 0),
+    add = a3 <- c(0, a[-1] + a[-n]), replications = 3, order = "relative")
       test replications elapsed relative user.self sys.self user.child sys.child
3       add            3    0.00  0.00000      0.00        0         NA        NA
2  rollmean            3    0.07  1.00000      0.08        0         NA        NA
1 rollapply            3    1.85 26.42857      1.84        0         NA        NA
> 
> all.equal(a1, a2)
[1] TRUE
> all.equal(a1, a3)
[1] TRUE
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I am getting the following:`> install.packages("zoo", repo = "http://r-forge.r-project.org") Installing package(s) into ‘C:/REVOLU~1/R-COMM~1.3/R-212~1.2/library’ (as ‘lib’ is unspecified) Warning in install.packages : cannot open: HTTP status was '404 Not Found' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Warning in install.packages : unable to access index for repository http://r-forge.r-project.org/bin/windows/contrib/2.12 Warning in install.packages : package ‘zoo’ is not available` – Suminda Sirinath S. Dharmasena Nov 26 '11 at 19:11
  • Plain `install.packages("zoo")` worked through. But I am not sure if it picked the correct version. – Suminda Sirinath S. Dharmasena Nov 26 '11 at 19:15
  • Package is: contrib/2.12/zoo_1.7-4.zip which has installed. – Suminda Sirinath S. Dharmasena Nov 26 '11 at 19:16
  • 1
    You will need to upgrade to R 2.14.0 and follow the instructions in the post exactly. – G. Grothendieck Nov 26 '11 at 21:19