I come back with a question for which i had an answer working on an example, but not on my data. From the data.frame "data" proposed below, with individuals (id) tested 3 times (T = 1, 2 or 3), i would like to build a new data.frame "data2" containing the individuals for who the value of the Y variable is "yes" at all three time points.
> data <- data.frame(id = rep(c(1:10), 3),
T = gl(3, 10),
X = sample(1:30),
Y = sample(c("yes", "no"), 30, replace = TRUE),
Z = sample(1:40, 30),
Z2 = rnorm(30, mean = 5, sd = 0.5))
> head(data)
id T X Y Z Z2
1 1 1 10 yes 15 5.993605
2 2 1 18 no 22 6.096566
3 3 1 5 no 24 5.101393
4 4 1 15 yes 18 4.944108
5 5 1 23 no 34 4.634176
6 6 1 13 no 27 5.576015
Instead of following a good proposition, but that didn't really worked, i would like to select separately the rows where Y is "yes" when T=="1", and then do the same for T=="2" and T=="3". Like that :
> data1y <- subset(data, T=="1"&Y=="yes")
> data2y <- subset(data, T=="2"&Y=="yes")
> data3y <- subset(data, T=="3"&Y=="yes")
At that point, i would have to match "id" from these 3 data.frames and keep only those that match on all three data.frame, putting them back into a new data.frame. Can someone help me for this last step? Thanks