2

I want to calculate the average number of people in hospital at every hour during the day with tidyverse. Can someone help?

Here it is the ID, Admissions (Adm) and Disc (Discharges).

ID = c(101, 102,103, 104, 105, 106, 107)

Adm = as.POSIXct(c("2012-01-12 00:52:00", "2012-01-12 00:55:00", "2012-02-12 
                    01:35:00", "2012-02-12 03:24:00", "2012-02-12 04:24:00", 
                    "2012-02-12 05:24:00", "2012-02-12 05:28:00"))

Disc = as.POSIXct(c("2012-01-13 02:00:00", "2012-01-13 02:59:00", "2012-02-12 
                     03:01:00", "2012-02-12 05:01:00", "2012-02-12 06:01:00", 
                     "2012-02-12 08:01:00", "2012-02-12 08:01:00"))

df = data.frame(ID, Adm, Disc)

Can someone help please!

GaB
  • 1,076
  • 2
  • 16
  • 29
  • Each row represents a patient? – nghauran Jun 20 '18 at 09:06
  • yes, each row in ID is a patient, and the time when he was admitted and discharged. – GaB Jun 20 '18 at 09:08
  • https://stackoverflow.com/questions/27487949/how-to-perform-join-over-date-ranges-using-data-table – abhiieor Jun 20 '18 at 09:29
  • Is it normal to find _Adm_ `2012-02-12` and for the same patient _Discharges_ `2012-01-12`? – nghauran Jun 20 '18 at 09:38
  • I have corrected the dates. This is an example of data that I chosen not the real data I am working on. And, indeed I have mistaken. I have corrected everything now and hope there are no mistakes. Overall you got the point. – GaB Jun 20 '18 at 10:05

1 Answers1

1

Here is the tidyverse approach :-

Basically hours between Adm & Disc are calculated using seq -

  • e.g. hours between Adm = 2012-01-12 00:52:00 & Disc = 2012-01-12 02:00:00 for ID 101 would be 2012-01-12 00:00:00, 2012-01-12 01:00:00 & 2012-01-12 02:00:00.

These hours are concatenated using paste into a single column hours_list for each row and later separated into multiple rows using separate_rows.

Finally unique ID count is calculated by grouping on the calculated hours between admission and discharge timings.

library(tidyverse)
library(lubridate)

df %>%
  mutate_at(vars(Adm:Disc), funs(ymd_h(strftime(., format = "%Y-%m-%d %H")))) %>% #date-hour is separated from timestamp and then converted into POSIXct format
  rowwise() %>%
  mutate(hours_list = paste(seq(Adm, Disc, by = "hour"), collapse = ",")) %>%     #hours between Adm & Disc are calculated and concatenated by ','
  separate_rows(hours_list, sep = ",") %>%                                        #calculated hours are separated into multiple rows
  mutate(hours_list = ymd_hms(hours_list)) %>%                                    #calculated hours are converted into POSIXct format
  group_by(hours_list) %>%
  summarise(patient_count = n_distinct(ID))                                       #unique patient count is calculated by grouping on calculated hours_list

which gives

   hours_list          patient_count
   <dttm>                      <int>
 1 2012-01-12 00:00:00             2
 2 2012-01-12 01:00:00             2
 3 2012-01-12 02:00:00             2
 4 2012-02-12 01:00:00             1
 5 2012-02-12 02:00:00             1
 6 2012-02-12 03:00:00             2
 7 2012-02-12 04:00:00             2
 8 2012-02-12 05:00:00             4
 9 2012-02-12 06:00:00             3
10 2012-02-12 07:00:00             2
11 2012-02-12 08:00:00             2


Sample data:

df <- structure(list(ID = c(101, 102, 103, 104, 105, 106, 107), Adm = structure(c(1326309720, 
1326309900, 1328990700, 1328997240, 1329000840, 1329004440, 1329004680
), class = c("POSIXct", "POSIXt"), tzone = ""), Disc = structure(c(1326313800, 
1326317340, 1328995860, 1329003060, 1329006660, 1329013860, 1329013860
), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("ID", 
"Adm", "Disc"), row.names = c(NA, -7L), class = "data.frame")

#   ID                 Adm                Disc
#1 101 2012-01-12 00:52:00 2012-01-12 02:00:00
#2 102 2012-01-12 00:55:00 2012-01-12 02:59:00
#3 103 2012-02-12 01:35:00 2012-02-12 03:01:00
#4 104 2012-02-12 03:24:00 2012-02-12 05:01:00
#5 105 2012-02-12 04:24:00 2012-02-12 06:01:00
#6 106 2012-02-12 05:24:00 2012-02-12 08:01:00
#7 107 2012-02-12 05:28:00 2012-02-12 08:01:00
Prem
  • 11,775
  • 1
  • 19
  • 33
  • Hey Prem - got an error - Error in mutate_impl(.data, dots) : Evaluation error: 'to' must be a finite number – GaB Jun 21 '18 at 10:54
  • 1
    In your original data it seems that `Disc` is `NA` in few rows. You can test it using `df[is.na(df$Disc),]`. If this is the case then you may want to remove those rows and it can be done by adding `df %>% na.omit()` at the beginning of my code. – Prem Jun 21 '18 at 10:56
  • hello Prem, I identified the real issue within my data set . Indeed this is related to missing value. Thank you a lot, Prem. You rock :)) – GaB Jun 21 '18 at 12:33
  • Glad that it helped :) – Prem Jun 21 '18 at 12:34