When is it advisable to use sparse matrices in R?

Question

I have been tinkering with power simulations recently and I have the following code:

library(MASS)
library(Matrix)

simdat <- data.frame(mmm = rep(rep(factor(1:2,
                                          labels=c("m1", "m2")),
                                   each = 2),
                               times = 2800),
                 ttt = rep(factor(1:2,
                                  labels = c("t1", "t2")),
                           times = 5600),
                 sss = rep(factor(1:70),
                            each = 160),
                 iii = rep(rep(factor(1:40),
                               each = 4),
                           times = 70))

beta <- c(1, 2)

X1 <- model.matrix(~ mmm,
                   data = simdat)

Z1 <- model.matrix(~ ttt,
                   data = simdat)

X1 and Z1 are 11200x2 matrices. With the help of Stackoverflow I managed to make my calculations a lot more efficient than they were before:

funab <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(X1 %*% beta)

    M1 <- rowSums(Z1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

Y will then be a vector of length 11200. I then replicate this function a lot (say 1000 times):

sim <- replicate(n        = 1000,
                 expr     = funab()},
                 simplify = FALSE)

sim will be a 11200x1000 list. Given that I want to do this a lot more and possibly include more code into funab() I wonder if it is advisable to use sparse matrices for X1 and Z1 in the calculations in funab() as it is now?

Are you familiar with `microbenchmark`? You can use it to compare performance across functions, aka. benchmarking. Simply, `install.packages(c("microbenchmark"), dependencies = TRUE)`, `require(microbenchmark)` and `example(microbenchmark)`, you know the drill. I've used `microbenchmark` in [this SO answer](http://stackoverflow.com/a/25082086/1305688). — Eric Fail, Aug 03 '14 at 20:26
If you do run a test it would be interesting to add that to your question. — Eric Fail, Aug 05 '14 at 13:53

score 1 · Accepted Answer · answered Aug 06 '14 at 15:52

Ok, I've tried to follow an advice given in the comments to my question and ran a test with the microbenchmark package. To make copy and pasting easier I will repeat the code from above:

library(MASS)
library(Matrix)

simdat <- data.frame(mmm = rep(rep(factor(1:2,
                                          labels=c("m1", "m2")),
                                   each = 2),
                               times = 2800),
                 ttt = rep(factor(1:2,
                                  labels = c("t1", "t2")),
                           times = 5600),
                 sss = rep(factor(1:70),
                            each = 160),
                 iii = rep(rep(factor(1:40),
                               each = 4),
                           times = 70))

beta <- c(1, 2)

X1 <- model.matrix(~ mmm,
                   data = simdat)

Z1 <- model.matrix(~ ttt,
                   data = simdat)

I now create the same matrices as sparse matrices:

sparseX1 <- sparse.model.matrix(~ mmm,
                                data = simdat)

sparseZ1 <- sparse.model.matrix(~ ttt,
                                data = simdat)

I then set up the two functions:

funab_sparse <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(sparseX1 %*% beta)

    M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

funab <- function(){
    ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))

    Mb <- as.vector(X1 %*% beta)

    M1 <- rowSums(Z1 * ran_sub[rep(1:70,
                                        each = 160),])

    M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
                                        times = 70),])

    Mout <- Mb + M1 + M2

    Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}

library(microbenchmark)
res <- microbenchmark(funab(), funab_sparse(), times = 1000)

and get the results:

> res <- microbenchmark(funab(), funab_sparse(), times = 1000)
> res
Unit: milliseconds
           expr      min       lq   median       uq      max neval
        funab() 2.200342 2.277006 2.309587 2.481627 69.99895  1000
 funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907  1000

Assuming that I did not make any substantial mistakes I can conclude that with this particular way of doing the calculations using sparse matrices will not speed up my code.

When is it advisable to use sparse matrices in R?

1 Answers1