2

My question extends this one: Generate rows between two dates into a data frame in R

I have a dataset on admissions, discharges and lengths of stay (Stay_in_days) of patients from a hospital. It looks like this:

ID Admission           Discharge             Stay_in_days 
1    2020-08-20 15:25:03 2020-08-21 21:09:34 1.239
2    2020-10-04 21:53:43 2020-10-09 11:02:57 4.548
... 

Dates are in POSIXct format so far.

I aim for this:

ID   Date                 Stay_in_days 
1    2020-08-20 15:25:03  0.357 
1    2020-08-21 21:09:49  1.239
2    2020-10-04 21:53:43  0.087
2    2020-10-05 00:00:00  1.087
2    2020-10-06 00:00:00  2.087
2    2020-10-07 00:00:00  3.087
2    2020-10-08 00:00:00  4.087
2    2020-10-09 11:02:57  4.548
...

What I have done so far:

M <- Map(seq, patients$Admission, patients$Discharge, by = "day")
patients2 <- data.frame(
  ID = rep.int(patients$ID, vapply(M, length, 1L)), 
  Date = do.call(c, M)
) 

patients <- patients %>%
mutate(
 Date2=as.Date(Date, format = "%Y-%m-%d"),
 Dat2=Date2+1,
 Diff=difftime(Date2, Date, units = "days")
)


but this gives me:

ID   Date                 Date2      Diff
1    2020-08-20 17:25:03  2020-08-21 0.375
1    2020-08-21 17:25:03  2020-08-22 0.357
2    2020-10-04 23:53:43  2020-10-05 0.087
2    2020-10-05 23:53:43  2020-10-06 0.087
2    2020-10-06 23:53:43  2020-10-07 0.087
2    2020-10-07 23:53:43  2020-10-08 0.087
2    2020-10-08 23:53:43  2020-10-09 0.087
...

Strangely enough, it adds two hours to the Admission date but calculates the correct length of stay. Can someone explain?

Here is some data:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), Admission = structure(c(1597937103.872, 
1598717768.704, 1599060521.984, 1599758087.168, 1599815496.704, 
1600702198.784, 1600719631.36, 1601065923.584, 1601119400.96, 
1601215476.736, 1601236710.4, 1601416934.4, 1601499640.832, 1601545647.104, 
1601587328, 1601644868.608, 1601741206.528, 1601848423.424, 1601901245.44, 
1601913828.352), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Discharge = structure(c(1598044189.696, 1598897337.344, 1599144670.208, 
    1599845118.976, 1599842366.464, 1602733683.712, 1603372135.424, 
    1601125168.128, 1601314173.952, 1605193905.152, 1602190259.2, 
    1601560720.384, 1601737143.296, 1602705634.304, 1602410460.16, 
    1602698425.344, 1601770566.656, 1602241377.28, 1602780476.416, 
    1602612048.896), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Stay_in_days = c(1.239, 2.078, 0.974, 1.007, 0.311, 23.513, 
    30.7, 0.686, 2.254, 46.047, 11.036, 1.664, 2.749, 13.426, 
    9.527, 12.194, 0.34, 4.548, 10.176, 8.081)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

Thanks in advance for your help!

Kaya
  • 115
  • 6

2 Answers2

1

You can achieve this with pivot_longer from tidyr. Edit: with comments:

df1 <- df %>% 
  select(ID = ID, date1 = Admission, date2 = Discharge, Stay_in_days) %>% # prepare for pivoting
  pivot_longer(
    cols = starts_with("date"),
    names_to = "Date1",
    values_to = "Date",
  ) %>% # pivot to longformat
  select(-Date1) %>% # remove temporary Date1
  relocate(Stay_in_days, .after = Date) %>% # change column order
  group_by(ID) %>%
  mutate(idgroup = rep(row_number(), each=1:2, length.out = n())) %>% # id for admission = 1 and for discharge id = 2
  mutate(Stay_in_days = replace(Stay_in_days, row_number() == 1, 0)) %>%  # set Admission to zero
  ungroup() 

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Many thanks, @TarJae, this is very helpful. However, I would need every date in between Admission and Discharge as rows with the respective length of stay until then (please see table in my "I aim for this"). Do you have an idea on how to solve that? – Kaya Feb 07 '21 at 09:44
  • 1
    See my proposed answer, that take cares of the requirement. However it took very long to figure out the `stay` period calculation. – AnilGoyal Feb 07 '21 at 10:24
  • Thanks AnilGoyal indeed "a bit crude" but I have no better solution. – TarJae Feb 07 '21 at 11:07
1

Though it is a bit crude but it'll work

library(tidyverse)
library(lubridate)

df %>% 
  pivot_longer(cols = -c(ID, Stay_in_days), names_to = "Event", values_to = "DATE") %>%
  group_by(ID) %>%
  mutate(dummy = case_when(Event == "Admission" ~ 0,
                           Event == "Discharge" ~ max(floor(Stay_in_days),1))) %>%
  complete(dummy = seq(min(dummy), max(dummy), 1)) %>%
  mutate(Event = ifelse(is.na(Event), "Dummy", Event),
         DATE = if_else(is.na(DATE), first(DATE)+dummy*24*60*60, DATE),
         Stay_in_days = case_when(Event == "Admission" ~ as.numeric(difftime(ceiling_date(DATE, "day"), DATE, units = "days")),
                                   Event == "Discharge" ~ Stay_in_days,
                                   TRUE ~ dummy + as.numeric(difftime(ceiling_date(first(DATE), "day"), first(DATE), units = "days")))) %>%
  select(ID, DATE, Stay_in_days)

# A tibble: 199 x 3
# Groups:   ID [20]
      ID DATE                Stay_in_days
   <dbl> <dttm>                     <dbl>
 1     1 2020-08-20 15:25:03        0.358
 2     1 2020-08-21 21:09:49        1.24 
 3     2 2020-08-29 16:16:08        0.322
 4     2 2020-08-30 16:16:08        1.32 
 5     2 2020-08-31 18:08:57        2.08 
 6     3 2020-09-02 15:28:41        0.355
 7     3 2020-09-03 14:51:10        0.974
 8     4 2020-09-10 17:14:47        0.281
 9     4 2020-09-11 17:25:18        1.01 
10     5 2020-09-11 09:11:36        0.617
# ... with 189 more rows

Explanation of logic For the first date in every ID, the stay_in_days gives the duration from admission date-time to following 24 Hrs. For intermediate dates, it just adds 1 to previous value. For discharge_date it retains the stay value calculated prior to pivoting. Hope this was you after.

Explanation of code After pivoting longer, I used a dummy column to create intermediate date-time objects. After that I just mutate the columns for generating output as described above.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • 1
    Dear @AnilGoyal, thanks again for your help on this. I have been using the code a lot and now realised a major problem: If the *time* of discharge is before the time of admission, then the code leaves out the last full day the patient was in the hospital. I hope you understand what I mean - Is there a way of solving that easily? – Kaya Mar 03 '21 at 17:22
  • Let me have a look. By the way can you include one example of this type? – AnilGoyal Mar 04 '21 at 00:38