I'm trying to figure out how to combine rows that have a single column of dates, such that the new table/data frame/tibble will have two columns: one for the start date and one for the end date, but only for consecutive dates (i.e. any gaps in dates should be separated into a new row in the new table). It would also be grouped by different categorizations.
An example of the kind of data I'm manipulating is the following:
Person ID Department Date
351581 JE 12/1/2019
351581 JE 12/2/2019
351581 FR 12/2/2019
351581 JE 12/3/2019
598168 GH 12/16/2019
351581 JE 12/8/2019
351581 JE 12/9/2019
615418 AB 12/20/2019
615418 AB 12/22/2019
And the desired result would be:
Person ID Department Start Date End Date
351581 JE 12/1/2019 12/3/2019
351581 FR 12/2/2019 12/2/2019
598168 GH 12/16/2019 12/16/2019
351581 JE 12/8/2019 12/9/2019
615418 AB 12/20/2019 12/20/2019
615418 AB 12/22/2019 12/22/2019
My searches so far have turned up a couple of possibly-related questions that involve combining date ranges, but I'm not sure how they'd be applied to just a single column of dates:
Find all date ranges for overlapping start and end dates in R
dplyr
Adding this for the benefit of future folks, I ended up applying the accepted solution with dplyr, just because I'm more comfortable with the syntax.
df %>%
mutate(Date = as.Date(Date)) %>%
arrange(`Person ID`, Department, Date) %>%
group_by(`Person ID`, Department,
g = cumsum(c(0, diff(Date)) != 1)
) %>%
summarize(Start = min(Date), End = max(Date)) %>%
ungroup %>%
select(-g)