1

I would like to subset the data of one matrix using data in a second matrix. The columns of one matrix is labeled. For example,

area1 <- c(9836374,635440,23018,833696,936079,1472449,879042,220539,870581,217418,552303,269359,833696,936079,1472449,879042,220539,870581, 833696,936079,1472449,879042,220539,870581)
id <- c(1,2,5,30,31,34,1,2,5,1,2,5,1,2,5,30,31,34,51,52,55,81,82,85)
mat1 <- matrix(area1, ncol=3, byrow=T)
mat2 <- matrix(id, ncol=3, byrow=T)
dimnames(mat1) <-list(NULL, c("a1","a2","a3"))   

mat2 contains the ids for mat1, so the dimensions of the matrix are the same (i.e., mat1[1,1] identifies mat2[1,1]. What I want is to create submatrices of mat1 when the row with values c(1, 2, 5) shows up in mat2. In this present mini example, submatrix 1 would have 2 rows of data, submatrix 2 and 3 have 1 row each, and submatrix 4 would have 4 rows of data from mat1. The number of rows between subsequent rows with 1,3,5 varies. Does this make sense?

Originally, the matrices were transformed from a dataframe, with id in one column and area in a second column. I couldn't find a way to subset variable rows between rows of 1 within a dataframe, which is why I switched to a matrix.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 2
    Welcome to SO and thank you for the well specified question. Can you share a bit of the code you have tried so far and how it hasn't worked ? Also, can you include exactly what your anticipated output should look like? – Justin Sep 11 '13 at 22:08
  • 1
    Thanks for the nice example! Could you please also post your desired output, i.e. the four submatrices. This would make it much easier to understand exactly what you wish to achieve. Cheers. – Henrik Sep 11 '13 at 22:14

4 Answers4

2

I think this covers it and fits with your description:

spl <- cumsum(apply(mat2,1, function(x) all(x==c(1,2,5))))
split(as.data.frame(mat1),spl)

#$`1`
#       a1     a2      a3
#1 9836374 635440   23018
#2  833696 936079 1472449
# 
#$`2`
#      a1     a2     a3
#3 879042 220539 870581
#
#$`3`
#      a1     a2     a3
#4 217418 552303 269359
#
#$`4`
#      a1     a2      a3
#5 833696 936079 1472449
#6 879042 220539  870581
#7 833696 936079 1472449
#8 879042 220539  870581

The result fits with "submatrix 1 would have 2 rows of data, submatrix 2 and 3 have 1 row each, and submatrix 4 would have 4 rows of data from mat1"

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • I have no idea how you translated that but have a +1 from me! – Simon O'Hanlon Sep 11 '13 at 22:38
  • @SimonO101 - I speak fluent 'beginner R' as a result of several years of fumbling around as a psychology grad. with little formal data training. – thelatemail Sep 11 '13 at 22:53
  • 1
    You mean `beginR` :-) – Simon O'Hanlon Sep 11 '13 at 22:55
  • It's hard to tell whether this or @eddi's answer what the OP is looking for, but +1 for the interpretation :) – A5C1D2H2I1M1N2O1R2T1 Sep 12 '13 at 01:46
  • @AnandaMahto - I'm pretty sure I'm right - this line by the OP tipped me off "*The number of rows between subsequent rows with 1,3,5 (sic) varies*" – thelatemail Sep 12 '13 at 02:08
  • Re: Justin and Henrik. Thanks so much for the encouragement! I have been reading and slowly learning R on SO for about 6 months before I got the courage to post a question. – user2770184 Sep 12 '13 at 18:51
  • Re: thelatemail: thank you! I just found that same answer as you yesterday evening. Now I am trying to figure out how to access these groups in the list. Or is it still in dataframe format? – user2770184 Sep 12 '13 at 18:54
  • @user2770184 - if you have `result <- split(as.data.frame(mat1),spl)` you access each element like `result[[1]]`. – thelatemail Sep 12 '13 at 23:18
1
mat1[which(mat2[,1]==1 & mat2[,2]==2 & mat2[,3]==5),]
        [,1]   [,2]    [,3]
[1,] 9836374 635440   23018
[2,]  879042 220539  870581
[3,]  217418 552303  269359
[4,]  833696 936079 1472449
eddi
  • 49,088
  • 6
  • 104
  • 155
Metrics
  • 15,172
  • 7
  • 54
  • 83
1
split(as.data.frame(mat1), apply(mat2, 1, paste, collapse = " "))
#$`1 2 5`
#       a1     a2      a3
#1 9836374 635440   23018
#3  879042 220539  870581
#4  217418 552303  269359
#5  833696 936079 1472449
#
#$`30 31 34`
#      a1     a2      a3
#2 833696 936079 1472449
#6 879042 220539  870581
#
#$`51 52 55`
#      a1     a2      a3
#7 833696 936079 1472449
#
#$`81 82 85`
#      a1     a2     a3
#8 879042 220539 870581
eddi
  • 49,088
  • 6
  • 104
  • 155
0

I think from what you said, you wanted to keep it as a data frame. You can easily make submatrices by grabbing rows with certain column values.

Here, I put the data frame back together and made a submatrix just for 1. You can easily add onto it by doing something like using cbind on multiple "area1" columns.

> area1 <- c(9836374,635440,23018,833696,936079,1472449,879042,220539,870581,217418,552303,269359,833696,936079,1472449,879042,220539,870581, 833696,936079,1472449,879042,220539,870581)
> id <- c(1,2,5,30,31,34,1,2,5,1,2,5,1,2,5,30,31,34,51,52,55,81,82,85)
> original<-as.data.frame(cbind(id,area1))
> original[original$id==1,]
   id   area1
1   1 9836374
7   1  879042
10  1  217418
13  1  833696

Then you can do what I said before like this.

> col1<-original[original$id==1,"area1"]
> col2<-original[original$id==2,"area1"]
> col3<-original[original$id==5,"area1"]
> submat<-cbind(col1,col2,col3)
> colnames(submat)<-c("a1","a2","a3")
> submat
          a1     a2      a3
[1,] 9836374 635440   23018
[2,]  879042 220539  870581
[3,]  217418 552303  269359
[4,]  833696 936079 1472449
Ben
  • 421
  • 3
  • 10