I'm working a large matrix (187,682,789 x 5)
Say it's build like this:
Day1 <- rep(1, 10)
Lat=sample(30:33, 10, replace=T)
Lon=sample(-30:-33, 10, replace=T)
Var=runif(10,1,100)
Mat1<-cbind(Day1,Lat,Lon,Var)
Day2 <- rep(2, 10)
Lat=sample(30:33, 10, replace=T)
Lon=sample(-30:-33, 10, replace=T)
Var=runif(10,1,100)
Mat2<-cbind(Day2,Lat,Lon,Var)
#... And so on, but let's stick to 2 days for the example
Mat = rbind(Mat1,Mat2)
Of course here, there is a redundancy in the number of unique Lat Lon combinaison.
position=cbind(Mat[,2],Mat[,3]) # Lat Lon
nrow(unique(position)) < nrow(position) #True
I would like to obtain a matrix that shows all the unique Lat Lon combinaison followed by all the corresponding variable per day.
For example:
> Mat
Day Lat Lon Var
[1,] 1 36 -36 51.086210
[2,] 1 37 -37 48.486008
[3,] 1 38 -38 39.482635
[4,] 1 39 -39 97.848232
[5,] 1 40 -40 71.076543
[6,] 2 31 -31 5.641855
[7,] 2 32 -32 62.124584
[8,] 2 33 -33 39.524119
[9,] 2 34 -34 7.214646
[10,] 2 35 -35 94.254170
[11,] 2 36 -36 40.615783
[12,] 2 37 -37 71.319719
[13,] 2 38 -38 81.775119
[14,] 2 39 -39 49.224411
[15,] 2 40 -40 80.813237
Would become:
>Resulting.Mat.Var
Unique.Lat Unique.Lon Day1 Day2
[1,] 36 -36 51.08621 40.615783
[2,] 37 -37 48.48601 71.319719
[3,] 38 -38 39.48264 81.775119
[4,] 39 -39 97.84823 49.224411
[5,] 40 -40 71.07654 80.813237
[6,] 31 -31 NA 5.641855
[7,] 32 -32 NA 62.124584
[8,] 33 -33 NA 39.524119
[9,] 34 -34 NA 7.214646
[10,] 35 -35 NA 94.254170
I tried to create a Matrix of NAs and fill it with 2 for loops, but it really takes too long !
Many thanks !
Edit: This is somewhat different than what I found on SO since it really need efficiency, all are in numeric format and there are 2 columns that form the position...
J