I think I have a novel question that, try as I might, I have been unable to solve. I have been using this site for several months to learn R and have been able to solve all of the questions I've had up until now. I am doing a large retrospective cohort study and lets just say our sample looks something like this:
my.df <- data.frame(ID = sample(c(1,2,3), 10, replace = TRUE),
Date = seq(as.Date("2012-08-01"),
as.Date("2012-11-01"), 1)[sample(1:10, 10)],
ICD = c( 401.3, 401.3, 250.02, 250.02, 110.1,
110.1, 250.02, 250.02, 250.02,112.1))
What I need to do is select the ID's that have a specific diagnosis (lets say 250.02) on two consecutive visits. In order to go about doing this, I used code similar to this:
my.df<-with(my.df, my.df[order(ID,(as.Date(Date))), ])
to organize the data based on date then group by ID. My next step, I think, is to either write a loop function or write a function with ddply to select out consecutive dates with the same ICD code. The first problem is I'm working on crappy computers with a VERY large data set and I'm afraid a loop function will be so memory intensive the computers will either freeze or crash. The second problem is that up until now, I have worked mostly by vectorized data to get by and my loop/function programing skills are lacking at best. Any suggestions on how to efficently solve this problem would be appreciated.