stack every n columns of a matrix without apply in R

Question

I have a matrix Vmat:

    v1 = c(4  ,  8  ,  3 ,   5 ,   9)       
    v2 = c(5  ,  6  ,  6 ,  11  ,  6)
    v3 = c( 5  ,  6 ,   6 ,  11  ,  6)
    v4=  c(8, 6,  4, 4, 3)
    v5 =  c(4  ,  8  ,  3 ,   5  ,  9)
    v6=  c(8  ,  6  ,  4  ,  4 ,   3)
    v7 = c( 3 ,   2  ,  7   , 7 ,   4)
    v8=  c(3  ,  2   , 7   , 7  ,  4)

row1 = c(v1,v2)
row2 = c(v3,v4)

row3 = c(v5,v6)

row4 = c(v7,v8)

Vmat = rbind(row1,row2,row3,row4)


 Vmat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
row1    4    8    3    5    9    5    6    6   11     6
row2    5    6    6   11    6    8    6    4    4     3
row3    4    8    3    5    9    8    6    4    4     3
row4    3    2    7    7    4    3    2    7    7     4

I want to stack the matrix by dividing it into 2 (n=ncol(Vmat)/2 = every 5 columns).

So the output is:

 [,1] [,2] [,3] [,4] [,5] 
    4    8    3    5    9   
    5    6    6   11    6  
    4    8    3    5    9    
    3    2    7    7    4   
    5    6    6   11     6
    8    6    4    4     3
    8    6    4    4     3
    3    2    7    7     4

Because it's too slow, the matrix is massive and I'm doing computationally really demanding stuff — wolfsatthedoor, Sep 26 '15 at 03:43
At the very minimum it will be 2 million columns across and 500 rows, but that is the "toy" example. The real example will be more like 20 million across and 16,000 rows, probably bigger even. I want to program it as fast as possible in anticipation — wolfsatthedoor, Sep 26 '15 at 03:48
Is there anything you've already tried yourself, so we know what not to do? I've been toying with it, but even a dummy matrix by your 'toy' example takes up almost all of my 8GB of RAM. — Heroka, Sep 26 '15 at 04:13
One possibility would be to convert to `array`, rearrange with `aperm` and then change the dimensions. `n <- 5;ar1 <- array(Vmat, dim=c(4,n,ncol(Vmat)/n)); ar2 <- aperm(ar1, c(1,3,2));dim(ar2) <- c(8,5)` — akrun, Sep 26 '15 at 04:14

jlhoward · Answer 1 · 2015-09-26T04:42:40.483

Here's a potential data.table solution:

# install.packages("data.table", type="source")   # requires 1.9.6+
library(data.table)
vm  <- ncol(Vmat)/2
lst <- lapply(1:vm,function(i)c(i,i+vm))
result <- melt(as.data.table(Vmat),measure=lst)[,variable:=NULL]
result
#    value1 value2 value3 value4 value5
# 1:      4      8      3      5      9
# 2:      5      6      6     11      6
# 3:      4      8      3      5      9
# 4:      3      2      7      7      4
# 5:      5      6      6     11      6
# 6:      8      6      4      4      3
# 7:      8      6      4      4      3
# 8:      3      2      7      7      4

Using a more realistic example:

set.seed(1)
Vmat <- matrix(sample(0:9,16e3*1000,replace=TRUE),nr=16e3)
library(data.table)
system.time({
  vm  <- ncol(Vmat)/2
  lst <- lapply(1:vm,function(i)c(i,i+vm))
  result <- melt(as.data.table(Vmat),measure=lst)[,variable:=NULL]
  })
#    user  system elapsed 
#     0.3     0.0     0.3

So, 16,000 rows X 1000 cols takes ~0.3s. Note that while this "uses lapply(...)", it is just used to create the measure.vars list for melt(...), which does all the work.

@Akrun's solution (same system):

system.time({
  n <- ncol(Vmat)/2
  ar1 <- array(Vmat, dim=c(nrow(Vmat),n,ncol(Vmat)/n))
  ar2 <- aperm(ar1, c(1,3,2))
  dim(ar2) <- c(prod(dim(ar1)[c(1,3)]),n)
})
#    user  system elapsed 
#    0.38    0.00    0.37 

all.equal(as.matrix(result),ar2,check.attributes=F)
# [1] TRUE

`dim(result)# [1] 32000 500; dim(ar2)# [1] 3200000 5` . If you are using `check.attributes=FALSE`, I think it will not check the dimensions. — akrun, Sep 26 '15 at 04:47
OK, so maybe I'm misunderstanding the question. I thought OP wanted `ncol(Vmat)/2` columns?? — jlhoward, Sep 26 '15 at 04:52
OP mentioned about every n columns, I think you may have to change the `vm` step. — akrun, Sep 26 '15 at 04:52

akrun · Answer 2 · 2015-09-26T04:33:59.360

1

We could convert the matrix to array, then transpose the array with aperm, and change the dimensions.

n <- 5
ar1 <- array(Vmat, dim=c(nrow(Vmat),n,ncol(Vmat)/n))
ar2 <- aperm(ar1, c(1,3,2))
dim(ar2) <- c(prod(dim(ar1)[c(1,3)]),n)
ar2
#      [,1] [,2] [,3] [,4] [,5]
#[1,]    4    8    3    5    9
#[2,]    5    6    6   11    6
#[3,]    4    8    3    5    9
#[4,]    3    2    7    7    4
#[5,]    5    6    6   11    6
#[6,]    8    6    4    4    3
#[7,]    8    6    4    4    3
#[8,]    3    2    7    7    4

Using @jlhoward's data,

system.time({
  n <- 5
  ar1 <- array(Vmat, dim=c(nrow(Vmat),n,ncol(Vmat)/n))
  ar2 <- aperm(ar1, c(1,3,2))
 dim(ar2) <- c(prod(dim(ar1)[c(1,3)]),n)
})
# user  system elapsed 
# 0.265   0.015   0.279

edited Sep 26 '15 at 04:33

answered Sep 26 '15 at 03:32

akrun

874,273
37
540
662

Isn't this using apply? If the matrix is enormous this will be slow right? – wolfsatthedoor Sep 26 '15 at 03:37
You can't compare execution time across systems. Your approach runs in 0.38sec in my system, about 25% slower. – jlhoward Sep 26 '15 at 04:40
@jlhoward I didn't compare with your approach. I was trying to get an estimate of how fast my appraoch runs. It is more or less similar. – akrun Sep 26 '15 at 04:43
The difference is trivial with 1000 cols, but with 20MM cols (assuming enough memory, etc.), it is fairly substantial. It is also possible (probable) that the two approaches scale differently. – jlhoward Sep 26 '15 at 04:50
@jlhoward It is possible, but I just compared the big data you showed. – akrun Sep 26 '15 at 04:51

stack every n columns of a matrix without apply in R

2 Answers2

Linked