0

I have some hospital data that looks like this:

patient_id treatment_1 treatment_2 date_dummy
3 2012-01-04 2012-03-27 0
3 2021-07-11 2012-10-20 0
3 2013-04-04 2013-06-22 0
12 2012-12-09 2013-11-09 0
18 2012-02-25 2012-03-26 0
25 2012-10-06 2013-12-29 1
25 2013-04-06 2013-07-07 0

I need to re-create the date_dummy variable that equals 1 if the patient was treated again between the two treatment dates, and 0 otherwise. Patient 25 is the best example of this.

If anyone knows a command to do this using the dplyr package in R that would awesome. Thanks for any help.

gobygoul
  • 5
  • 3
  • Wouldn't it be good, if the actual overlapping row i.e second row of patient 25 flagged as `treated again`? Just asking for my knowledge, thanks. – AnilGoyal Apr 27 '21 at 05:27
  • Seems like a duplicate of https://stackoverflow.com/questions/67288732/how-do-i-determine-in-r-if-a-date-interval-overlaps-another-date-interval-for-th/67289069#67289069, is this a homework question? – dash2 Apr 28 '21 at 11:08

2 Answers2

2

to check whether a date is within the range of two other dates, you can use:

library(lubridate)
x %within% interval(ymd(20161001), ymd(20170930))

This checks whether x is between October 1st 2016 and Sep 30th, 2017.

I'm not sure what your date for 'treated again' within the two treatment dates is called but something like this may work:

data %>%
    mutate(date_dummy = ifelse(treated_again_date %within% interval(treatment_1, treatment_2), 1, 0)
Rex Parsons
  • 199
  • 10
  • I think that's on the right track, but I don't have a "treated_again_variable". The dummy needs to be created by person_id and with only the two treatment variables. I'm basically looking for if the two intervals overlap. – gobygoul Apr 27 '21 at 01:12
  • Ahh I see. I think I misunderstood your question beforehand but I see what you mean now. Glad you found a solution! – Rex Parsons Apr 27 '21 at 22:14
0

Building upon @Rex Parsons answer you can do :

library(dplyr)
library(lubridate)
library(purrr)

df %>%
  mutate(across(starts_with('treatment'), as.Date), 
         interval = interval(treatment_1, treatment_2)) %>%
  group_by(patient_id) %>%
  mutate(date_dummy = map_int(row_number(), 
                       ~as.integer(any(interval[-.x] %within% interval[.x])))) %>%
  ungroup

#  patient_id treatment_1 treatment_2 date_dummy interval                      
#       <int> <date>      <date>           <int> <Interval>                    
#1          3 2012-01-04  2012-03-27           0 2012-01-04 UTC--2012-03-27 UTC
#2          3 2012-07-11  2012-10-20           0 2012-07-11 UTC--2012-10-20 UTC
#3          3 2013-04-04  2013-06-22           0 2013-04-04 UTC--2013-06-22 UTC
#4         12 2012-12-09  2013-11-09           0 2012-12-09 UTC--2013-11-09 UTC
#5         18 2012-02-25  2012-03-26           0 2012-02-25 UTC--2012-03-26 UTC
#6         25 2012-10-06  2013-12-29           1 2012-10-06 UTC--2013-12-29 UTC
#7         25 2013-04-06  2013-07-07           0 2013-04-06 UTC--2013-07-07 UTC

You may want to remove interval column from the final output if you don't need it.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213