Here is my dataset:
mydata = data.frame (Id =c(1,1,1,1,1,1,1,1,1,1),
Date = c("2000-01-01","2000-01-05","2000-02-02", "2000-02-12",
"2000-02-14","2000-05-13", "2000-05-15", "2000-05-17",
"2000-05-16", "2000-05-20"),
drug = c("A","A","B","B","B","A","A","A","C","C"))
The below code tells me the difference between dates of administration grouped by ID and drug. As you can see for drug A there is a gap >100 days between dates administered.
mydata <- mydata %>% group_by(Id, drug) %>% mutate(Diff = difftime(Date, lag(Date), units = 'days'))
The task is to group by id and drug and take the earliest and latest date of administration for each drug, but if there is a >100 day gap between the dates between the same type of drug then it needs to have it own earliest and latest date row.
The following code allows me to get the earliest and latest date, but I'm not sure how to add the 100 day gap here.
mydata %>% group_by(Id, drug) %>%
summarise(startDate = min(as.Date(Date),na.rm = T),
endDate = max(as.Date(Date),na.rm = T))
The below is the output of what im hoping to get
mydata1 = data.frame (Id =c(1,1,1,1),
startDate = c("2000-01-01","2000-02-02","2000-05-13", "2000-05-16"),
endDate = c("2000-01-05", "2000-02-14", "2000-05-17", "2000-05-20"),
drug = c("A","B","A","C"))
As you can see for drug A there is two rows representing the first start and end date and then the second start and end date after >100 days passed between administration dates.
Any help will be really appreciated! Thank you