1

I'm working a large matrix (187,682,789 x 5)

Say it's build like this:

Day1 <- rep(1, 10)  
Lat=sample(30:33, 10,  replace=T)  
Lon=sample(-30:-33, 10, replace=T)  
Var=runif(10,1,100)  
Mat1<-cbind(Day1,Lat,Lon,Var)  


Day2 <- rep(2, 10)  
Lat=sample(30:33, 10,  replace=T)  
Lon=sample(-30:-33, 10, replace=T)  
Var=runif(10,1,100)  
Mat2<-cbind(Day2,Lat,Lon,Var) 

#... And so on, but let's stick to 2 days for the example 

Mat = rbind(Mat1,Mat2)

Of course here, there is a redundancy in the number of unique Lat Lon combinaison.

position=cbind(Mat[,2],Mat[,3]) # Lat Lon
nrow(unique(position)) < nrow(position) #True

I would like to obtain a matrix that shows all the unique Lat Lon combinaison followed by all the corresponding variable per day.

For example:

> Mat
          Day Lat Lon       Var
     [1,]   1  36 -36 51.086210
     [2,]   1  37 -37 48.486008
     [3,]   1  38 -38 39.482635
     [4,]   1  39 -39 97.848232
     [5,]   1  40 -40 71.076543
     [6,]   2  31 -31  5.641855
     [7,]   2  32 -32 62.124584
     [8,]   2  33 -33 39.524119
     [9,]   2  34 -34  7.214646
    [10,]   2  35 -35 94.254170
    [11,]   2  36 -36 40.615783
    [12,]   2  37 -37 71.319719
    [13,]   2  38 -38 81.775119
    [14,]   2  39 -39 49.224411
    [15,]   2  40 -40 80.813237

Would become:

  >Resulting.Mat.Var
    Unique.Lat Unique.Lon  Day1         Day2
    [1,]  36    -36      51.08621    40.615783
    [2,]  37    -37      48.48601    71.319719
    [3,]  38    -38      39.48264    81.775119
    [4,]  39    -39      97.84823    49.224411
    [5,]  40    -40      71.07654    80.813237
    [6,]  31    -31            NA    5.641855
    [7,]  32    -32            NA    62.124584
    [8,]  33    -33            NA    39.524119
    [9,]  34    -34            NA    7.214646
    [10,] 35    -35            NA    94.254170

I tried to create a Matrix of NAs and fill it with 2 for loops, but it really takes too long !

Many thanks !

Edit: This is somewhat different than what I found on SO since it really need efficiency, all are in numeric format and there are 2 columns that form the position...

J

J Lal
  • 37
  • 6
  • 1
    Possible duplicate of [Reshape data from long to wide format R](http://stackoverflow.com/questions/5890584/reshape-data-from-long-to-wide-format-r) – dww Aug 10 '16 at 00:02

3 Answers3

2

This is a typical "long-to-wide" conversion problem. One possibility to obtain the desired form is to use dcast() from the reshape2 package:

library(reshape2)
as.matrix(dcast(as.data.frame(Mat), Lat + Lon ~ Day, value.var = "Var"))
#      Lat Lon        1         2
# [1,]  31 -31       NA  5.641855
# [2,]  32 -32       NA 62.124584
# [3,]  33 -33       NA 39.524119
# [4,]  34 -34       NA  7.214646
# [5,]  35 -35       NA 94.254170
# [6,]  36 -36 51.08621 40.615783
# [7,]  37 -37 48.48601 71.319719
# [8,]  38 -38 39.48264 81.775119
# [9,]  39 -39 97.84823 49.224411
#[10,]  40 -40 71.07654 80.813237

Quite a few similar questions have been answered before on SO, so this is probably a duplicate. However, most questions refer to data.frame structures, and not to matrices.

data:

Mat <- structure(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 36, 
            37, 38, 39, 40, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, -36, 
            -37, -38, -39, -40, -31, -32, -33, -34, -35, -36, -37, -38, -39, 
            -40, 51.08621, 48.486008, 39.482635, 97.848232, 71.076543, 5.641855, 
            62.124584, 39.524119, 7.214646, 94.25417, 40.615783, 71.319719, 
            81.775119, 49.224411, 80.813237), .Dim = c(15L, 4L), 
           .Dimnames = list(NULL, c("Day", "Lat", "Lon", "Var")))
RHertel
  • 23,412
  • 5
  • 38
  • 64
  • Worked great although I had to split the original matrix in several smaller ones, but I combined them back afterwards and got the same result so thanks !! – J Lal Aug 12 '16 at 12:05
  • @JLal The splitting of the matrix sounds somewhat dangerous. I assume that it was necessary because of RAM limits. Probably you had to make sure that after the split the data belonging to one day was not distributed on different matrices. Anyway, I'm glad to hear that it worked. – RHertel Aug 12 '16 at 12:21
  • @JLal Please consider [accepting one of the answers](http://meta.stackexchange.com/q/5234) if it helped solving your problem. – RHertel Aug 12 '16 at 12:50
1

Another method using dplyr is:

library(dplyr)
Resulting.Mat.Var <- as.matrix(
  Mat %>% group_by(Unique.Lat=Lat,Unique.Lon=Lon) %>% 
          summarise(Day1=Var[which(Day==1)], Day2=Var[which(Day==2)]))

print(Resulting.Mat.Var)
##      Unique.Lat Unique.Lon     Day1      Day2
## [1,]         31        -31       NA  5.641855
## [2,]         32        -32       NA 62.124584
## [3,]         33        -33       NA 39.524119
## [4,]         34        -34       NA  7.214646
## [5,]         35        -35       NA 94.254170
## [6,]         36        -36 51.08621 40.615783
## [7,]         37        -37 48.48601 71.319719
## [8,]         38        -38 39.48264 81.775119
## [9,]         39        -39 97.84823 49.224411
##[10,]         40        -40 71.07654 80.813237
aichao
  • 7,375
  • 3
  • 16
  • 18
  • Mmm but what if you have plenty of different (and almost uncountable) days ? Also, I am no so confortable with the dplyr lib and such %>% operators are unfamiliar but I'll look into it for sure. Thanks ! – J Lal Aug 12 '16 at 12:10
1

Looks like a merge to me:

> merge( Mat[Mat[,'Day']==1 , -1], Mat[ Mat[,'Day']==2, -1], by=c(1,2) , all=TRUE)
   Lat Lon    Var.x     Var.y
1   31 -31       NA  5.641855
2   32 -32       NA 62.124584
3   33 -33       NA 39.524119
4   34 -34       NA  7.214646
5   35 -35       NA 94.254170
6   36 -36 51.08621 40.615783
7   37 -37 48.48601 71.319719
8   38 -38 39.48264 81.775119
9   39 -39 97.84823 49.224411
10  40 -40 71.07654 80.813237

Can coerce to matrix if needed since that result is a data.frame

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I like that. But I think that the OP indicated that the original data contains more than two days. – RHertel Aug 10 '16 at 03:08
  • Then would be thinking of a `Reduce` call to `merge`. – IRTFM Aug 10 '16 at 05:39
  • Yeah as RHertel mentionned this needed to be done with several days, but I'll keep in mind the 'merge' tool, at least for the context when I define my problem. Thank you ! – J Lal Aug 12 '16 at 12:07