0

overflowers,

I have an issue to solve with regards to counting the patients:

  • fulfilling certain criteria but at different point in time. To the problem which I am trying to solve, different points in time, are dates, but rounded to the last day of the month. As you can see, I have patients before 2011, those whose dates are in 2008, 2009, 2010. These, then shall be counted to 2011-01-31 date. With the other dates, from 2011 on wards, these shall be counted to the rounded date (last day of the month).

The criteria for a patient to be counted is:

  • the patient was discharged and was diagnosed.

Here is an example:

I have id

ID = c(101, 102,103, 104, 105, 106, 107, 108, 109,110, 111, 112, 113, 114)

I have registration dates

Reg.date = as.POSIXct(c("2008-01-14", "2008-03-19", "2009-06-13", 
                        "2010-08-17", "2011-09-11", "2012-08-14",
                        "2013-09-21",   "2013-08-09", "2014-02-15", 
                        "2014-07-22",  "2015-03-13",  "2015-05-30" , 
                         "2017-01-26", "2017-01-01"))

I deducted date -Deduct.date- with NA values, this means the patients are still alive. Where there are dates, that means, the patients are discharged. The same with Diagnosis - the NA means the patient wasn#t diagnosed.

Deduct.date = as.POSIXct(c("2008-09-16 ", "2010-01-13", "2010-02-12", 
                           "2011-02-12",    NA,       "2012-02-12",
                           "2012-02-12",     NA,           NA,
                             NA,           "2014-04-12",   NA,
                             NA,          "2017-02-11" ))

Diagnosis = as.POSIXct(c("2008-05-11" , "2009-03-13", "2009-01-03",
                            NA,         NA,           NA,
                          "2011-04-05",   NA,           NA,
                          "2013-03-05",   NA,            NA,
                          NA))

df = data.frame(ID, Reg.date, Deduct.date)

I am looking to do this with tidyverse and lubridate. And the outcome can be given in a dataframe with counts for the rounded dates, with reference to the ones provided above.

GaB
  • 1,076
  • 2
  • 16
  • 29
  • 1
    Your examples throw a bunch of errors. You can just use `dput()` for creating a reproducible example – Sotos Jun 27 '18 at 12:34
  • What have you tried? Why the limit to lubridate and tidyverse? – Roman Luštrik Jun 27 '18 at 12:36
  • Just to clarify `NA` within 'Deduct date' means a patient is still alive _and under medical care_, and specifically 'Deduct date' is defined as a patient being _either_ deceased or discharged? – dad Jun 27 '18 at 12:36
  • Also it looks like you're directly typing in "NA" which is a string, not the same as what R uses, which is why we need you to paste the output of use something like `dput(head(df))`, or `dput(my_sample_df)` [a la this guide](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (just Ctrl-F for `dput`) – dad Jun 27 '18 at 12:43
  • hello people, I am new to R and stackoverflow. Therefore, I do apologise for any errors you get. Will learn the dput() so that I will reproduce the example. – GaB Jun 27 '18 at 15:21
  • hello shib - yes, NA within Deduct date and Diagnosis alive respectively not diagnosed. So, here NA's are very useful. – GaB Jun 27 '18 at 16:08
  • Hi Gabriel, that might not be necessary for this example since I worked with what you provided already below, but it's good to keep in mind to use `dput` on small subsets for future reference. – dad Jun 27 '18 at 16:42

1 Answers1

0

Here's my solution, using lubridate:

library(dplyr)
library(lubridate)

ID = c(101, 102,103, 104, 105, 106, 107, 108, 109,110, 111, 112, 113, 114)

Reg.date = as.POSIXct(c("2008-01-14", "2008-03-19", "2009-06-13", 
                        "2010-08-17", "2011-09-11", "2012-08-14",
                        "2013-09-21",   "2013-08-09", "2014-02-15", 
                        "2014-07-22",  "2015-03-13",  "2015-05-30", 
                        "2017-01-26", "2017-01-01"))

Deduct.date = as.POSIXct(c("2008-09-16 ", "2010-01-13", "2010-02-12", 
                           "2011-02-12",    NA,       "2012-02-12",
                           "2012-02-12",     NA,           NA,
                           NA,           "2014-04-12",   NA,
                           NA,          "2017-02-11" ))

Diagnosis = as.POSIXct(c("2008-05-11" , "2009-03-13", "2009-01-03",
                         NA,         NA,           NA,
                         "2011-04-05",   NA,           NA,
                         "2013-03-05",   NA,            NA,
                         NA))

# Convert dates to lubridate date objects
(df <- data.frame(ID, Reg.date, Deduct.date) %>% 
  mutate(Reg.date = as_date(Reg.date), Deduct.date = as_date(Deduct.date)))
#>     ID   Reg.date Deduct.date
#> 1  101 2008-01-14  2008-09-16
#> 2  102 2008-03-19  2010-01-13
#> 3  103 2009-06-13  2010-02-12
#> 4  104 2010-08-17  2011-02-12
#> 5  105 2011-09-11        <NA>
#> 6  106 2012-08-14  2012-02-12
#> 7  107 2013-09-21  2012-02-12
#> 8  108 2013-08-09        <NA>
#> 9  109 2014-02-15        <NA>
#> 10 110 2014-07-22        <NA>
#> 11 111 2015-03-13  2014-04-12
#> 12 112 2015-05-30        <NA>
#> 13 113 2017-01-26        <NA>
#> 14 114 2017-01-01  2017-02-11

# Prior to 2011 dates
df[!is.na(df$Reg.date) & !is.na(df$Deduct.date) & 
     df$Reg.date < as_date('2011-01-01'),]$Reg.date <- as_date('2011-01-31')

df[!is.na(df$Deduct.date) & !is.na(df$Deduct.date) & 
     df$Deduct.date < as_date('2011-01-01'),]$Deduct.date <- as_date('2011-01-31')

# 2011 onwards
lastDayOfMonth_Reg <- df[!is.na(df$Reg.date) & !is.na(df$Deduct.date) & 
                           df$Reg.date > as_date('2011-01-01'),]$Reg.date %>% 
  ceiling_date('month') - 1
df[!is.na(df$Reg.date) & !is.na(df$Deduct.date) & 
     df$Reg.date > as_date('2011-01-01'),]$Reg.date <- lastDayOfMonth_Reg

lastDayOfMonth_Deduct <- df[!is.na(df$Reg.date) & !is.na(df$Deduct.date) & 
                              df$Deduct.date > as_date('2011-01-01'),]$Deduct.date %>% 
  ceiling_date('month') - 1
df[!is.na(df$Reg.date) & !is.na(df$Deduct.date) & 
     df$Deduct.date > as_date('2011-01-01'),]$Deduct.date <- lastDayOfMonth_Deduct

df
#>     ID   Reg.date Deduct.date
#> 1  101 2011-01-31  2011-01-31
#> 2  102 2011-01-31  2011-01-31
#> 3  103 2011-01-31  2011-01-31
#> 4  104 2011-01-31  2011-02-28
#> 5  105 2011-09-11        <NA>
#> 6  106 2012-08-31  2012-02-29
#> 7  107 2013-09-30  2012-02-29
#> 8  108 2013-08-09        <NA>
#> 9  109 2014-02-15        <NA>
#> 10 110 2014-07-22        <NA>
#> 11 111 2015-03-31  2014-04-30
#> 12 112 2015-05-30        <NA>
#> 13 113 2017-01-26        <NA>
#> 14 114 2017-01-31  2017-02-28

I don't know an incredibly elegant way to apply a filter condition and simultaneously mutate specific dplyr columns using the mutate_[if/all/at] functions here, so I used some base R as well.

[Note: I included the sample data here for reproducibility since my peer reviewed edit on some corrections to OP's sample data i.e. commas / NA quotes / spelling etc is still pending]

dad
  • 1,335
  • 9
  • 28