I have a dataset in long format with multiple start
and end
dates for patients with unique id
. Dates represent hospital admission and discharge. Some patients have multiple overlapping stays in hospital, some have stays that don't overlap, and other cases have start
(admission) and end
dates (discharge) on same day.
Building on a post that used a lagged start
date and the cummax
function, I wish to do 3 things:
- For cases with overlapping
start
andend
dates, combine/merge cases, keeping the earlieststart
date and lastend
date. - For cases with
start
andend
dates that are the same date, maintain that observation (don't merge). - Create new variable
surgdays
that is calculated from max days in surgical unit (surg
), for both merged and non-merged cases.
I have data like this:
id start end surg
1 A 2013-01-01 2013-01-05 0
2 A 2013-01-08 2013-01-12 1
3 A 2013-01-10 2013-01-14 6
4 A 2013-01-20 2013-01-20 3
5 B 2013-01-15 2013-01-25 4
6 B 2013-01-20 2013-01-22 5
7 B 2013-01-28 2013-01-28 0
What I've tried:
library(dplyr)
data %>%
arrange(data, id, start) %>%
group_by(id) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(id, indx) %>%
summarise(start = first(start), end = last(end), surgdays = max(surg))
What I get:
id indx start end surgdays
1 A 0 2013-01-01 2013-01-05 0
2 A 1 2013-01-08 2013-01-14 7
3 A 2 2013-01-20 2013-01-20 3
The problem: the number of rows examined with this code is limited to the number of columns in my dataset. For example, with 4 variables/columns, it worked with data from only first 4 rows (including merging two rows with overlapping dates) then stopped...even though there are 7 rows in example (and thousands of rows in actual dataset).
Similarly, when I try same code with 70 columns (and hundreds of rows), it combines overlapping dates but only based on the first 70 rows. My initial thought was to create as many placeholder columns as there are observations in my dataset but this is clunky workaround.
What I'd like to get:
id indx start end surgdays
1 A 0 2013-01-01 2013-01-05 0
2 A 1 2013-01-08 2013-01-14 7
3 A 2 2013-01-20 2013-01-20 3
4 B 0 2013-01-15 2013-01-22 9
5 B 1 2013-01-28 2013-01-28 0