5

basically i want to perform diagonal averaging in R. Below is some code adapted from the simsalabim package to do the diagonal averaging. Only this is slow.

Any suggestions for vectorizing this instead of using sapply?

reconSSA <- function(S,v,group=1){
### S : matrix
### v : vector

    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- row(XX)+col(XX)-1
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- S[,group] %*% t(t(XX) %*% S[,group])

    ##Diagonal Averaging
    .intFun <- function(i,x,ind) mean(x[ind==i])

    RC <- sapply(1:N,.intFun,x=XX,ind=IND)
    return(RC)
}

For data you could use the following

data(AirPassengers)
v <- AirPassengers
L <- 30
T <- length(v)
K <- T-L+1

x.b <- matrix(nrow=L,ncol=K)
x.b <- matrix(v[row(x.b)+col(x.b)-1],nrow=L,ncol=K)
S <- eigen(x.b %*% t(x.b))[["vectors"]] 
out <- reconSSA(S, v, 1:10)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
pslice
  • 503
  • 1
  • 4
  • 13

2 Answers2

3

You can speed up the computation by almost 10 times with the help of a very specialized trick with rowsum:

reconSSA_1 <- function(S,v,group=1){
### S : matrix
### v : vector
    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- row(XX)+col(XX)-1
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- S[,group] %*% t(t(XX) %*% S[,group])
    ##Diagonal Averaging
    SUMS <- rowsum.default(c(XX), c(IND))
    counts <- if(L <= K) c(1:L, rep(L, K-L-1), L:1)
    else c(1:K, rep(K, L-K-1), K:1)
    c(SUMS/counts)
}

all.equal(reconSSA(S, v, 1:10), reconSSA_1(S, v, 1:10))
[1] TRUE

library(rbenchmark)

benchmark(SSA = reconSSA(S, v, 1:10),
          SSA_1 = reconSSA_1(S, v, 1:10),
          columns = c( "test", "elapsed", "relative"),
          order = "relative")


    test elapsed relative
2 SSA_1    0.23   1.0000
1   SSA    2.08   9.0435

[Update: As Joshua suggested it could be speed up even further by using the crux of the rowsum code:

reconSSA_2 <- function(S,v,group=1){
### S : matrix
### v : vector
    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- c(row(XX)+col(XX)-1L)
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- c(S[,group] %*% t(t(XX) %*% S[,group]))
    ##Diagonal Averaging
    SUMS <- .Call("Rrowsum_matrix", XX, 1L, IND, 1:N, 
                  TRUE, PACKAGE = "base")
    counts <- if(L <= K) c(1:L, rep(L, K-L-1), L:1)
    else c(1:K, rep(K, L-K-1), K:1)
    c(SUMS/counts)
}

   test elapsed  relative
3 SSA_2   0.156  1.000000
2 SSA_1   0.559  3.583333
1   SSA   5.389 34.544872

A speedup of x34.5 comparing to original code!!

]

VitoshKa
  • 8,387
  • 3
  • 35
  • 59
  • Very nice vectorization with `rowsums`! – Joshua Ulrich Oct 27 '10 at 20:58
  • You could make it even faster by only using the parts of `rowsums` that you need: (i.e. the `sort(unique(...))` and the `.Call("Rrowsum_matrix", ...)`. – Joshua Ulrich Oct 27 '10 at 21:10
  • :) rowsum was written for speed, I wish there were more funcs like this, specialized for grouped operations on arrays. – VitoshKa Oct 27 '10 at 21:24
  • @Joshua I remember experimenting with that, a half a year or so ago. It indeed speeds the stuff another 4 times, as far as I could remember. But one small error in inputs and your R session is gone:). Reorder might not be necessary for this example. So it might be quite worth rewriting the rowsum a bit. – VitoshKa Oct 27 '10 at 21:31
  • It turned out that for this particular problem, unique was not necessary and the use of integer groups speed up the whole computation quite considerably. – VitoshKa Oct 27 '10 at 22:07
  • Why is the scaling so different between the 3 different versions? they all have the right shape. the first version preserves the scale though... – pslice Nov 01 '10 at 19:26
  • @pslice, Sorry mate, I don't get what you mean by "scaling". All functions return a vector of diagonal means. So they are identical in that respect. – VitoshKa Nov 02 '10 at 08:42
  • @VitoshKa : sorry about that. appears to be an error in my old code. burning too much midnight oil lately... – pslice Nov 02 '10 at 09:15
  • what if we want groups to be a list? Like groups= as.list(c(1:100))? Right now it won't accept this. I want it to reconstruct all 100 components separately or some form of a list. Any ideas? – pslice Nov 03 '10 at 20:47
  • @pslice, if you really need the group to be a list, do group <- unlist(group,F,F) inside the function. – VitoshKa Nov 04 '10 at 21:13
  • Doesn't that just return them back as a vector? I'm talking about reconstructing say components 1-5 together and 6-10 together ,etc.. the output should be 2 vectors in this ex. not 1. – pslice Nov 04 '10 at 22:52
  • I just used something like this rc<-function(S,v,g) reconSSA_2(S,v,g) groups=1:10 sapply(groups , rc, S=matrix, v=vector) – pslice Nov 05 '10 at 23:31
0

I can't get your example to produce sensible results. I think there are several errors in your function.

  1. XX is used in sapply, but is not defined in the function
  2. sapply works over 1:N, where N=144 in your example, but x.b only has 115 columns
  3. reconSSA simply returns x

Regardless, I think you want:

data(AirPassengers)
x <- AirPassengers
rowMeans(embed(x,30))

UPDATE: I've re-worked and profiled the function. Most of the time is spent in mean, so it may be hard to get this much faster using R code. That said, you can 20% speedup by using sum instead.

reconSSA <- function(S,v,group=1){

    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- row(XX)+col(XX)-1
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- S[,group] %*% t(t(XX) %*% S[,group])

    ##Diagonal Averaging
    .intFun <- function(i,x,ind) {
        I <- ind==i
        sum(x[I])/sum(I)
    }

    RC <- sapply(1:N,.intFun,x=XX,ind=IND)
    return(RC)
}
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • That's not quite what I am looking for. The idea is to use the structure of x.b (Hankel) and average the anti-diagonals, since we will seek an approximation of x.b , which will likely not have the proper structure(Hankel), so using diagonal averaging alleviates this problem to some extent. This falls under the topic of singular spectrum analysis. I also fixed the referencing you mentioned. – pslice Oct 27 '10 at 16:31
  • I'll take another shot at it if you can explain what you expect the call to `sapply` to do. Your intent is not clear from the code. – Joshua Ulrich Oct 27 '10 at 16:40
  • What I am expecting it to do is on the bottom of page 3. See the link for a PDF. http://bit.ly/ati1ll – pslice Oct 27 '10 at 18:50