68

I have a list of length 130,000 where each element is a character vector of length 110. I would like to convert this list to a matrix with dimension 1,430,000*10. How can I do it more efficiently?\ My code is :

output=NULL
for(i in 1:length(z)) {
 output=rbind(output,
              matrix(z[[i]],ncol=10,byrow=TRUE))
}
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
user1787675
  • 1,101
  • 2
  • 12
  • 12
  • 2
    If you want the dimensions to be 1430000*11 why do you set ncol to be 10? – Dason Nov 05 '12 at 00:46
  • 1
    Wait- when you say that each entry has 11 characters, you mean that it is a vector with 11 items? I originally thought that each was a string with 11 characters in it. Can you show `z[1:2]` as an example? – David Robinson Nov 05 '12 at 00:51
  • Thank Dason and David! That's a typo. I have corrected it. – user1787675 Nov 05 '12 at 00:54
  • @user1787675: I still don't understand. What is an "entry"? Is it a vector? Can you show `z[1:2]`? – David Robinson Nov 05 '12 at 01:07
  • Hi David, I looked up an dictionary and found that I mean the components in the list. I am sorry for the confusion I caused. I am not good at English :) – user1787675 Nov 05 '12 at 01:28

5 Answers5

146

This should be equivalent to your current code, only a lot faster:

output <- matrix(unlist(z), ncol = 10, byrow = TRUE)
flodel
  • 87,577
  • 21
  • 185
  • 223
16

I think you want

output <- do.call(rbind,lapply(z,matrix,ncol=10,byrow=TRUE))

i.e. combining @BlueMagister's use of do.call(rbind,...) with an lapply statement to convert the individual list elements into 11*10 matrices ...

Benchmarks (showing @flodel's unlist solution is 5x faster than mine, and 230x faster than the original approach ...)

n <- 1000
z <- replicate(n,matrix(1:110,ncol=10,byrow=TRUE),simplify=FALSE)
library(rbenchmark)
origfn <- function(z) {
    output <- NULL 
    for(i in 1:length(z))
        output<- rbind(output,matrix(z[[i]],ncol=10,byrow=TRUE))
}
rbindfn <- function(z) do.call(rbind,lapply(z,matrix,ncol=10,byrow=TRUE))
unlistfn <- function(z) matrix(unlist(z), ncol = 10, byrow = TRUE)

##          test replications elapsed relative user.self sys.self 
## 1   origfn(z)          100  36.467  230.804    34.834    1.540  
## 2  rbindfn(z)          100   0.713    4.513     0.708    0.012 
## 3 unlistfn(z)          100   0.158    1.000     0.144    0.008 

If this scales appropriately (i.e. you don't run into memory problems), the full problem would take about 130*0.2 seconds = 26 seconds on a comparable machine (I did this on a 2-year-old MacBook Pro).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • That's magical! It takes about 20 seconds to do this on my one-year-old toshiba machine, which saves me a lot of time. And your function to show the run time is very interesting too. – user1787675 Nov 05 '12 at 01:44
8

It would help to have sample information about your output. Recursively using rbind on bigger and bigger things is not recommended. My first guess at something that would help you:

z <- list(1:3,4:6,7:9)
do.call(rbind,z)

See a related question for more efficiency, if needed.

Community
  • 1
  • 1
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
4

You can also use,

output <- as.matrix(as.data.frame(z))

The memory usage is very similar to

output <- matrix(unlist(z), ncol = 10, byrow = TRUE)

Which can be verified, with mem_changed() from library(pryr).

csta
  • 2,423
  • 5
  • 26
  • 34
-6

you can use as.matrix as below:

output <- as.matrix(z)
Ahmed Gehad
  • 71
  • 1
  • 7