I have a dataset with what are essentially episodes of time related to an individual which can overlap (i.e. an episode could start later but finish earlier than the previous). Because of this overlap issue I'm struggling to get the latest end_date in the sequence once they're in order by start_date.
The code I've been using works to a point but I have to repeat as shown in the code below. For that reason I guess I need some loop function to go through a process until a condition is met (that the end_date is later than the end_date on the previous row, or the id indicates a new individual).
library(dplyr)
## creates example dataframe
id <- c("A","A","A","A","A","A","A","A","A","A",
"A","A","A","B","B","B","B","B","B")
start_date <- as.Date(c("2004-01-23","2005-03-31","2005-03-31","2005-12-20","2005-12-20",
"2006-04-03","2007-11-26","2010-10-12","2011-08-08","2012-06-26",
"2012-06-26","2012-09-11","2012-10-03","2003-12-01","2006-02-28",
"2012-04-16","2012-08-30","2012-09-19","2012-09-28"))
end_date <- as.Date(c("2009-06-30","2005-09-17","2005-09-19","2005-12-30","2005-12-30",
"2006-06-19","2009-06-30","2010-11-05","2011-11-18","2012-06-26",
"2012-06-26","2012-09-11","2014-04-01","2012-08-29","2006-02-28",
"2012-04-16","2012-09-28","2013-10-11","2013-07-19"))
target_date <- as.Date(c(NA,"2009-06-30","2009-06-30","2009-06-30","2009-06-30","2009-06-30",
"2009-06-30","2009-06-30","2010-11-05","2011-11-18","2012-06-26",
"2012-06-26","2012-09-11",NA,"2012-08-29","2012-08-29","2012-08-29",
"2012-09-28","2013-10-11"))
df <- data.frame(id, start_date, end_date, target_date)
Using the method to flatten overlapping time periods gets me close but I think it needs a lag adding in somewhere to replicate the target_date...
df <- df %>%
arrange(id, start_date) %>%
group_by(id) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start_date)) >
cummax(as.numeric(end_date)))[-n()])) %>%
group_by(id, indx) %>%
mutate(latest_date = max(end_date)) %>%
ungroup()