R: Compare dates in two dataframes and isolate rows that match within a certain time period in R

Question

I have two dataframes in R:

df1:
ID  Date Discharged
1   2014-08-04
2   2014-12-10
3   2015-01-01


df2:
ID   Check-in-Date
 1   2013-01-02
 1   2014-08-11
 2   2014-12-14
 2   2015-05-01
 3   2012-05-06
 3   2015-01-05

I need to compare df1 with df2 based on ID and see which person checked in for another appointment within 7 days of being discharged. How would I accomplish this since df2 has duplicate IDs?

I'd like to create a new column in df1 with 1 if the person checked in and 0 if they didn't. I also need a new column with the check-in-date if they checked in with 7 days, or blank if they didn't.

NEW df1:
ID  Date Discharged  Within_7days   7-day Checkin
1   2014-08-04           1           2014-08-11
2   2014-12-10           1           2014-12-14
3   2015-01-01           1

Yes, some IDs will show up more than twice in the second df. — AlternativeHacks, Apr 24 '16 at 20:52

Michael Gao · Accepted Answer · 2016-04-24T22:49:30.447

Your last example should have a Within_7days value of 1 for ID of 3 according to your data.

The package lubridate is great for working with times. Also, I use dplyr for manipulating datasets.

# Load Libraries
library(dplyr)
library(lubridate)

# Recreate dataframes
df1 <- data.frame(ID = 1:3, Date.Discharged = c("2014-08-04","2014-12-10","2015-01-01"))
df2 <- data.frame(ID = c(1,2,2,3), CheckDate = c("2014-08-11","2014-12-14","2015-05-01","2015-01-05"))

# Make the characters in Date.Discharged and CheckDate into Date objects:
df1$Date.Discharged <- as.Date(df1$Date.Discharged)
df2$CheckDate <- as.Date(df2$CheckDate)

I assume that you only care about the minimum value for each ID for the check-in Date. That way if you select the minimum value, you get rid of any problems with multiple entries with the same ID.

df2 <- df2 %>% group_by(ID) %>% summarize(CheckDate = min(CheckDate))
# Now join the 2 dataframes together
df1 <- left_join(df1,df2) # this joins by ID by default

Finally, we make the column with Y/N

df1 <- df1 %>% mutate(within_7days = (CheckDate <= Date.Discharged + days(7))*1)

df1


  ID Date.Discharged  CheckDate within_7days
1  1      2014-08-04 2014-08-11            1
2  2      2014-12-10 2014-12-14            1
3  3      2015-01-01 2015-01-05            1

EDIT:

This is slightly inefficient, but will handle the case where you have values in the 2nd dataset that occur before the discharge date:

We can ignore the following line above:

df2 <- df2 %>% group_by(ID) %>% summarize(CheckDate = min(CheckDate))

and instead go straight to the join step and modify it as follows:

df1 <- left_join(df1,df2) %>% filter(CheckDate > Date.Discharged)

This will add all of the CheckDates that take place each Date Discharged for the same ID. This is again assuming that the ID's in DF1 are unique.

Hmm the problem with selecting the minimum value is this: For ID#1 in df1 there maybe a date of 2014-08-04 but in df2 there maybe check-in dates of 2013-01-01, 2014-08-07, etc. By choosing the minimum we ignore the fact that there could be dates before the one we're concerned about in df1. — AlternativeHacks, Apr 24 '16 at 22:38

R: Compare dates in two dataframes and isolate rows that match within a certain time period in R

1 Answers1

Linked