36

I have a large 2D matrix that is 1000 x 1000. I want to reshape this so that it is one column (or row). For example, if the matrix was:

A B C
1 4 7
2 5 8
3 6 9

I want to turn it in to:

1 2 3 4 5 6 7 8 9

I do not need to preserve the column headers, just the order of the data. How do I do this using reshape2 (which is the package that I presumed was the easiest to use)?


Just to clarify, I mentioned reshape as I thought it was the best way of doing this. I can see that there are simpler methods which I am perfectly happy with.

Yamaneko
  • 3,433
  • 2
  • 38
  • 57
djq
  • 14,810
  • 45
  • 122
  • 157
  • 7
    Whenever you vectorize a matrix, keep in mind that it always goes columns first. When you need to preserve the row order, then do `c(t(some.matrix))`. – Joris Meys Dec 31 '10 at 15:52
  • 1
    Changed the title to reflect the question asked. BTW, I wonder where that reshape-fetish is coming from. I see so many questions asking for a reshape answer to a problem for which reshape never was built in the first place. – Joris Meys Dec 31 '10 at 15:55
  • 4
    @Joris perhaps "If all you have is a hammer, everything looks like a nail."? – Joshua Ulrich Dec 31 '10 at 16:03
  • @Joris - ignorance really. I just assumed what I wanted to do was not a standard operation. I use ggplot2 where reshape2 is sometimes mentioned as they are both made by Hadley Wickham. – djq Dec 31 '10 at 17:57

5 Answers5

54

I think it will be difficult to find a more compact method than:

c(m)
[1] 1 2 3 4 5 6 7 8 9

However, if you want to retain a matrix structure, then this reworking of the dim attribute would be be effective:

dim(m) <- c(dim(m)[1]*dim(m)[2], 1)
m
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9

There would be more compact methods of getting the product of the dimensions but the above method emphasizes that the dim attribute is a two element vector for matrices. Other ways of getting the "9" in that example include:

> prod(dim(m))
[1] 9
> length(m)
[1] 9
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 6
    you can just do `cbind(c(m))` to make it a one-column matrix – Prasad Chalasani Dec 31 '10 at 15:53
  • 2
    @hadley OK, what about prod(dim(m))? – IRTFM Dec 31 '10 at 19:19
  • 1
    `dim(m) <- c(prod(dim(m)), 1)` is a bit nicer, and scales to any number of dimensions` – hadley Jan 04 '11 at 14:18
  • That was what I intended a reader to do. The code `prod(dim(m))` was offered as a replacement for the clunkier: `dim(m)[1]*dim(m)[2]` as a way of getting to 9. It was always intended to go into `dim(m)<-c(prod(dim(m)), 1)` and I guess that was why I couldn't figure out your comment. – IRTFM Jan 04 '11 at 15:15
  • For anyone with a `data.frame`, `unlist(df)` works. – kdauria Oct 21 '15 at 18:48
13

A possible solution, but without using reshape2:

> m <- matrix(c(1:9), ncol = 3)
> m
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
EDi
  • 13,160
  • 2
  • 48
  • 57
11

Come on R guys, lets give the OP a reshape2 solution:

> m <- matrix(c(1:9), ncol = 3)
> melt(m)$value
[1] 1 2 3 4 5 6 7 8 9

I just cant be bothered to test how much slower it is than c(m). It is the same, though:

> identical(c(m),melt(m)$value)
[1] TRUE

[EDIT: oh heck who am I kidding:]

> system.time(for(i in 1:1000){z=melt(m)$value})
   user  system elapsed 
  1.653   0.004   1.662 
> system.time(for(i in 1:1000){z=c(m)})
   user  system elapsed 
  0.004   0.000   0.004 
Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • The reshape solution is several orders of magnitude slower when tested on a 1000 x 1000 matrix... as you can see via your edit. ;-) – Joshua Ulrich Dec 31 '10 at 16:10
  • +1 for the timings. funny reshape-hack though, I wouldn't have thought of it. For obvious reasons ;-) – Joris Meys Dec 31 '10 at 17:10
  • Just for amusement: reshape2::melt is about 25% faster than reshape::melt (approx. 7.7 vs 10.3 seconds for 10000 reps) although still about 400 times slower than c(m) ... – Ben Bolker Jan 01 '11 at 14:53
4

as.vector(m) should be little more efficient then c(m):

> library(rbenchmark)
> m <- diag(5000)
> benchmark(
+   vect = as.vector(m), 
+   conc = c(m), 
+   replications=100
+ )
  test replications elapsed relative user.self sys.self user.child sys.child
2 conc          100  12.699    1.177     6.952    5.754          0         0
1 vect          100  10.785    1.000     4.858    5.933          0         0
df239
  • 471
  • 2
  • 4
0

One more simple way to do it by using function "sapply" (or the same could be done with 'for' loop as well)

 m <- matrix(c(1:9), ncol = 3)
 (m1 <- as.numeric(sapply(1:NROW(m), function(i)(m[,i]))))
user36478
  • 346
  • 6
  • 14