I am working with the following dataset : library(tidyverse) library(lubridate)
df <- data.frame(
icustay_id = c(1, 1, 1, 2, 3),
starttime = as.POSIXct(c("2019-09-10 13:20", "2019-09-11 13:30", "2019-09-14 16:40", "2019-09-10 12:40", "2019-09-10 01:20")),
endtime = as.POSIXct(c("2019-09-10 13:20", "2019-09-12 01:20", "2019-09-15 16:40", "2019-09-13 13:20", "2019-09-11 13:20")),
vaso_rate = sample(1:10, 5, replace = TRUE),
vaso_amount = runif(5, 0, 1000)
)
df
# icustay_id starttime endtime vaso_rate vaso_amount
# 1 1 2019-09-10 13:20:00 2019-09-11 13:20:00 3 293.0896
# 2 1 2019-09-11 13:30:00 2019-09-12 01:20:00 9 602.9983
# 3 1 2019-09-14 16:40:00 2019-09-15 16:40:00 4 208.9360
# 4 2 2019-09-10 12:40:00 2019-09-13 13:20:00 2 864.1494
# 5 3 2019-09-10 01:20:00 2019-09-11 13:20:00 9 405.2939
Basically, this shows the starttime and endtime of a medication received by different patients in POSIXCT.
I am trying to build a function that will :
- For every single unique patient (every unique icustay_id), merge the rows in which the medication has been stopped for less than an hour.
- When the row merges : Some columns will retain the same value (i.e. the patient identifiers) Some columns must be modified :
- Keep the earlier starttime
- Keep the latter endttime
- Average the vaso-rate
- Sum the vaso-amount
- Delete the durations
I am struggling with the second part, I can't seem to find the optimal way to address this conditional "merge".
To obtain something like :
df
# icustay_id starttime endtime vaso_rate vaso_amount
# 1 1 2019-09-10 13:20:00 2019-09-12 01:20:00 3 293.0896
# 2 1 2019-09-14 16:40:00 2019-09-15 16:40:00 4 208.9360
# 3 2 2019-09-10 12:40:00 2019-09-13 13:20:00 2 864.1494
# 4 3 2019-09-10 01:20:00 2019-09-11 13:20:00 9 405.2939
Notice in this example how patient with icustay 1 : only the events in which the the consecutive endtime-starttime difference was < 1 hour were grouped while the third and more distant event (> 1 hour of difference from the others) was kept ungrouped.
This what I have so far. I tried to add an additional group column for patients who meet the condition above and then eventually group_by this condition.
But it does not work...
merge_pressor_doses <- function(df){
df %>% arrange(icustay_id,starttime)
a <- 1
for (i in unique(df$icustay_id))
{
for (j in which(df$icustay_id==i) && j < max(which(df$icustay_id==i)))
{
df%>%mutate(group = ifelse(df$starttime[j+1]-df$endtime[j] < 60, a, 0))
}
}
df%>%group_by(group) %>%
summarise(
starttime = min(starttime),
endtime = max(endtime),
vaso_rate = mean(vaso_rate),
sum_vaso_amount = sum(vaso_amount))
return(df)
}