How to create a dummy variable in R for dates that lie between a certain interval?

Question

I have some hospital data that looks like this:

patient_id	treatment_1	treatment_2	date_dummy
3	2012-01-04	2012-03-27	0
3	2021-07-11	2012-10-20	0
3	2013-04-04	2013-06-22	0
12	2012-12-09	2013-11-09	0
18	2012-02-25	2012-03-26	0
25	2012-10-06	2013-12-29	1
25	2013-04-06	2013-07-07	0

I need to re-create the date_dummy variable that equals 1 if the patient was treated again between the two treatment dates, and 0 otherwise. Patient 25 is the best example of this.

If anyone knows a command to do this using the dplyr package in R that would awesome. Thanks for any help.

Wouldn't it be good, if the actual overlapping row i.e second row of patient 25 flagged as `treated again`? Just asking for my knowledge, thanks. — AnilGoyal, Apr 27 '21 at 05:27
Seems like a duplicate of https://stackoverflow.com/questions/67288732/how-do-i-determine-in-r-if-a-date-interval-overlaps-another-date-interval-for-th/67289069#67289069, is this a homework question? — dash2, Apr 28 '21 at 11:08

score 2 · Answer 1 · answered Apr 27 '21 at 00:48

2

to check whether a date is within the range of two other dates, you can use:

library(lubridate)
x %within% interval(ymd(20161001), ymd(20170930))

This checks whether x is between October 1st 2016 and Sep 30th, 2017.

I'm not sure what your date for 'treated again' within the two treatment dates is called but something like this may work:

data %>%
    mutate(date_dummy = ifelse(treated_again_date %within% interval(treatment_1, treatment_2), 1, 0)

answered Apr 27 '21 at 00:48

Rex Parsons

199
10

I think that's on the right track, but I don't have a "treated_again_variable". The dummy needs to be created by person_id and with only the two treatment variables. I'm basically looking for if the two intervals overlap. – gobygoul Apr 27 '21 at 01:12
Ahh I see. I think I misunderstood your question beforehand but I see what you mean now. Glad you found a solution! – Rex Parsons Apr 27 '21 at 22:14

score 0 · Accepted Answer · answered Apr 27 '21 at 02:07

Building upon @Rex Parsons answer you can do :

library(dplyr)
library(lubridate)
library(purrr)

df %>%
  mutate(across(starts_with('treatment'), as.Date), 
         interval = interval(treatment_1, treatment_2)) %>%
  group_by(patient_id) %>%
  mutate(date_dummy = map_int(row_number(), 
                       ~as.integer(any(interval[-.x] %within% interval[.x])))) %>%
  ungroup

#  patient_id treatment_1 treatment_2 date_dummy interval                      
#       <int> <date>      <date>           <int> <Interval>                    
#1          3 2012-01-04  2012-03-27           0 2012-01-04 UTC--2012-03-27 UTC
#2          3 2012-07-11  2012-10-20           0 2012-07-11 UTC--2012-10-20 UTC
#3          3 2013-04-04  2013-06-22           0 2013-04-04 UTC--2013-06-22 UTC
#4         12 2012-12-09  2013-11-09           0 2012-12-09 UTC--2013-11-09 UTC
#5         18 2012-02-25  2012-03-26           0 2012-02-25 UTC--2012-03-26 UTC
#6         25 2012-10-06  2013-12-29           1 2012-10-06 UTC--2013-12-29 UTC
#7         25 2013-04-06  2013-07-07           0 2013-04-06 UTC--2013-07-07 UTC

You may want to remove interval column from the final output if you don't need it.

How to create a dummy variable in R for dates that lie between a certain interval?

2 Answers2

Linked