selecting multiple criteria in R (bis)

Question

I come back with a question for which i had an answer working on an example, but not on my data. From the data.frame "data" proposed below, with individuals (id) tested 3 times (T = 1, 2 or 3), i would like to build a new data.frame "data2" containing the individuals for who the value of the Y variable is "yes" at all three time points.

> data <- data.frame(id = rep(c(1:10), 3),
                 T  = gl(3, 10),
                 X  = sample(1:30),
                 Y  = sample(c("yes", "no"), 30, replace = TRUE),
                 Z  = sample(1:40, 30),
                 Z2 = rnorm(30, mean = 5, sd = 0.5))

> head(data)
  id T  X   Y  Z       Z2
1  1 1 10 yes 15 5.993605
2  2 1 18  no 22 6.096566
3  3 1  5  no 24 5.101393
4  4 1 15 yes 18 4.944108
5  5 1 23  no 34 4.634176
6  6 1 13  no 27 5.576015

Instead of following a good proposition, but that didn't really worked, i would like to select separately the rows where Y is "yes" when T=="1", and then do the same for T=="2" and T=="3". Like that :

> data1y <- subset(data, T=="1"&Y=="yes")
   > data2y <- subset(data, T=="2"&Y=="yes")
    > data3y <- subset(data, T=="3"&Y=="yes")

At that point, i would have to match "id" from these 3 data.frames and keep only those that match on all three data.frame, putting them back into a new data.frame. Can someone help me for this last step? Thanks

i opened this new question on the advice of someone on the forum. Why -2?? — den, Jun 02 '13 at 18:54
StackOverflow is not a forum. This question is identical to your previous question. — joran, Jun 02 '13 at 18:59
I dont understand the difference it mkes but ok. So how can i bring back an old question that had not any good solution, despite helpful tips? — den, Jun 02 '13 at 19:14
You admitted that my solution to your previous (identical) question was correct: that it worked for the example you provided, but that it did not work with your real data. I suggested you give a small example of your real data, with whatever makes my answer not specific enough. Yet, you have posted here the same question with the same sample data... — flodel, Jun 02 '13 at 19:20

score 1 · Accepted Answer · edited Jun 02 '13 at 18:15

1

From your sample data:

> data[data$T %in% c(1:3) & data$Y=='yes',]
   id T  X   Y  Z             Z2
1   1 1 20 yes 33 4.802216126170
5   5 1 11 yes 38 4.961652111819
6   6 1 16 yes 39 5.280062964072
8   8 1  9 yes 10 4.390774184018
10 10 1  2 yes 24 5.304658353230
11  1 2 28 yes 16 5.431195694915
12  2 2 10 yes 14 4.719670597678
13  3 2 27 yes  3 4.568885260296
14  4 2  4 yes 32 5.699626145087
15  5 2 19 yes 21 5.378941823200
17  7 2  5 yes 34 5.144265923191
18  8 2  1 yes  8 5.138866423019
19  9 2 29 yes 35 5.938777921967
20 10 2 18 yes 30 5.562200417288
24  4 3  6 yes 23 4.723790836659
26  6 3 25 yes 29 5.915660736770
28  8 3  8 yes 19 5.133772600848

If that doesn't sort you, leave a comment...

edited Jun 02 '13 at 18:15

Tyler Rinker

108,132
65
322
519

answered Jun 02 '13 at 18:14

hd1

33,938
5
80
91

Thanks for the edit, matey... – hd1 Jun 02 '13 at 18:17
I think they are looking for one additional step. Something like `data1[data1$id %in% names(which(table(data1$id) == 3)), ]` where `data1` is your `data.frame` above. – A5C1D2H2I1M1N2O1R2T1 Jun 02 '13 at 18:19
Thank you hd1 AND Ananda Mahto : your answers complement perfectly each other and work very well. – den Jun 02 '13 at 18:48
QUESTION on this : why is there a "," at the end of both of your conditional expressions ? thank you !! – den Jun 02 '13 at 18:50
Because if you don't have it, it gives "undefined columns selected". – hd1 Jun 02 '13 at 20:23

selecting multiple criteria in R (bis)

1 Answers1