I have a data frame which looks like the following:
id <- c("Joe" ,"Joe" ,"Joe" ,"Joe" ,"Joe")
work_start <- as.Date(c("2004-06-23", "2005-04-20", "2005-05-24", "2014-05-01", "2018-04-01"))
work_end <- as.Date(c("2014-04-30", "2010-03-11", "2005-07-05", "2018-03-31", "2019-03-31"))
df <- data.frame(id, work_start, work_end)
I want to calculate total days for Joe that he has been in work also factoring in consecutive days - so in the example above that would be a continuous period from 23 Jun 2004 to 31 Mar 2019 (as the gap between 30 Apr 2014 and 1 May 2014 is a consecutive day).
I am trying to do this by calculating the last available day and have part of a process below, but not sure how I write it into a loop within dplyr, or if this is even the right approach to take with this problem at all? Any help much appreciated.
library(dplyr)
df <- df %>%
group_by(id) %>%
arrange(id, work_start, work_end) %>%
mutate(last_work_end = lag(work_end)) %>%
mutate(last_work_end = if_else(lag(last_work_end) > last_work_end & is.finite(lag(last_work_end)),
lag(last_work_end),
last_work_end)) %>%
mutate(last_work_end = if_else(lag(last_work_end) > last_work_end & is.finite(lag(last_work_end)),
lag(last_work_end),
last_work_end)) %>%
ungroup()