How to compute rolling covariance more efficiently

Question

I am trying to compute a rolling covariance between a set of data (each column of my x variable) and one other (y variable) in R. I thought I could use one of the apply functions, but can not find how to roll two set of inputs at the same time. Here is what I tried :

 set.seed(1)
 x<-matrix(rnorm(500),nrow=100,ncol=5)
 y<-rnorm(100)
 rollapply(x,width=5,FUN= function(x) {cov(x,y)})
 z<-cbind(x,y)
 rollapply(z,width=5, FUN=function(x){cov(z,z[,6])})

But none is doing what I would like. One solution I found is to use a for loop, but wondering if I can be more efficient in R than :

dResult<-matrix(nrow=96,ncol=5)
for(iLine in 1:96){
    for(iCol in 1:5){
        dResult[iLine,iCol]=cov(x[iLine:(iLine+4),iCol],y[iLine:(iLine+4)])
    }
}

which gives me the expected result :

head(dResult)


           [,1]       [,2]        [,3]        [,4]        [,5]
[1,]  0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
[2,] -0.03246014 0.78631603 -0.34309778  0.29919297 -0.22243572
[3,] -0.16239479 0.56372428 -0.27476604  0.39007645  0.05461355
[4,] -0.56764687 0.09847672  0.11204244  0.78044096 -0.01980684
[5,] -0.43081539 0.01904417  0.01282632  0.35550327  0.31062580
[6,] -0.28890607 0.03967327  0.58307743  0.15055881  0.60704533

Nice job on a thorough first post. – the_skua Jul 28 '16 at 19:39 — the_skua, Jul 28 '16 at 19:39

score 8 · Answer 1 · answered Jul 28 '16 at 20:14

8

This is a solution with rollapply() and sapply():

sapply(1:5, function(j) rollapply(1:100, 5, function(i) cov(x[i, j], y[i])))

I think that it is more readable and more R-ish than the solution with for-loops, but I checked with microbenchmark and it seems to be slower.

answered Jul 28 '16 at 20:14

Stibu

15,166
6
57
71

Oh I see my error (my R is still busy) well done! +1 – Jan Sila Jul 28 '16 at 20:21
It is working indeed, thanks. I guess we are not gaining time as you are somehow creating two loops on indices here as well, not really _applying_ the _cov_ function to the x and y data directly, as you could maybe with _mapply_ – Djiggy Jul 28 '16 at 20:30

score 8 · Accepted Answer · answered Jul 28 '16 at 20:18

set.seed(1)
x<-as.data.frame(matrix(rnorm(500),nrow=100,ncol=5))
y<-rnorm(100)

library(zoo)

covResult = sapply(x,function(alpha) {

cov_value = rollapply(cbind(alpha,y),width=5,FUN = function(beta) cov(beta[,1],beta[,2]),by.column=FALSE,align="right") 

return(cov_value)

})

head(covResult)
#              V1         V2          V3          V4          V5
#[1,]  0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
#[2,] -0.03246014 0.78631603 -0.34309778  0.29919297 -0.22243572
#[3,] -0.16239479 0.56372428 -0.27476604  0.39007645  0.05461355
#[4,] -0.56764687 0.09847672  0.11204244  0.78044096 -0.01980684
#[5,] -0.43081539 0.01904417  0.01282632  0.35550327  0.31062580
#[6,] -0.28890607 0.03967327  0.58307743  0.15055881  0.60704533

Also check out:

library(PerformanceAnalytics)
?chart.rollingCorrelation

This is exactly what I was trying to do in my second attempt with the _z_ variable, but my command of the _apply_ functions is still too limited. Thanks a lot ! — Djiggy, Jul 28 '16 at 20:34
If I was right in my comment to @Stibu , I guess this one should be also be faster than the _for_ loop I did — Djiggy, Jul 28 '16 at 20:36

score 3 · Answer 3 · answered Nov 23 '16 at 13:29

If you need something faster, and you do not need any of the non-default arguments to cov, you can use TTR::runCov. Note that it pads with leading NA by default.

The speed difference will matter more on larger data. Here's an example of how to use it:

cov_joshua <- function() {
  apply(x, 2, function(x, y) TTR::runCov(x, y, 5), y = y)
}

And here's a comparison to the currently accepted answer using the small dataset the OP provided:

cov_osssan <- function() {
  f <- function(b) cov(b[,1], b[,2])
  apply(x, 2, function(a) {
    rollapplyr(cbind(a,y), width=5, FUN = f, by.column=FALSE)
  })
}
require(zoo)  # for cov_osssan
require(microbenchmark)
set.seed(1)
nr <- 100
nc <- 5
x <- matrix(rnorm(nc*nr),nrow=nr,ncol=nc)
y <- rnorm(nr)
microbenchmark(cov_osssan(), cov_joshua())
# Unit: milliseconds
#          expr       min        lq    median       uq      max neval
#  cov_osssan() 22.881253 24.569906 25.625623 27.44348 32.81344   100
#  cov_joshua()  5.841422  6.170189  6.706466  7.47609 31.24717   100
all.equal(cov_osssan(), cov_joshua()[-(1:4),])  # rm leading NA
# [1] TRUE

Now, using a larger data set:

system.time(cov_joshua())
#    user  system elapsed 
#   2.117   0.032   2.158 
system.time(cov_osssan())
# ^C
# Timing stopped at: 144.957 0.36 145.491

I got tired of waiting (after ~2.5 minutes) for cov_osssan to complete.

score 1 · Answer 4 · answered Jul 28 '16 at 20:08

1

Right now I'm running some long simulations, so cant use R, but reckon this should work it. The outer apply by columns will take the column, pass it to rollapply, where it will be used to do the rolling window covariance with y. Hopefully :D

apply(x,2,function(x) rollapply(x,width=5,function(z) cov(x,y)))

answered Jul 28 '16 at 20:08

Jan Sila

1,554
3
17
36

This does not work. You are always calculating the covariance of the complete vectors x and y inside `rollaply()`. Therefore, each column contains the same value repeated 96 times. – Stibu Jul 28 '16 at 20:16

How to compute rolling covariance more efficiently

4 Answers4