1

I have two dataframes in R:

df1:
ID  Date Discharged
1   2014-08-04
2   2014-12-10
3   2015-01-01


df2:
ID   Check-in-Date
 1   2013-01-02
 1   2014-08-11
 2   2014-12-14
 2   2015-05-01
 3   2012-05-06
 3   2015-01-05

I need to compare df1 with df2 based on ID and see which person checked in for another appointment within 7 days of being discharged. How would I accomplish this since df2 has duplicate IDs?

I'd like to create a new column in df1 with 1 if the person checked in and 0 if they didn't. I also need a new column with the check-in-date if they checked in with 7 days, or blank if they didn't.

NEW df1:
ID  Date Discharged  Within_7days   7-day Checkin
1   2014-08-04           1           2014-08-11
2   2014-12-10           1           2014-12-14
3   2015-01-01           1
AlternativeHacks
  • 147
  • 1
  • 10

1 Answers1

0

Your last example should have a Within_7days value of 1 for ID of 3 according to your data.

The package lubridate is great for working with times. Also, I use dplyr for manipulating datasets.

# Load Libraries
library(dplyr)
library(lubridate)

# Recreate dataframes
df1 <- data.frame(ID = 1:3, Date.Discharged = c("2014-08-04","2014-12-10","2015-01-01"))
df2 <- data.frame(ID = c(1,2,2,3), CheckDate = c("2014-08-11","2014-12-14","2015-05-01","2015-01-05"))

# Make the characters in Date.Discharged and CheckDate into Date objects:
df1$Date.Discharged <- as.Date(df1$Date.Discharged)
df2$CheckDate <- as.Date(df2$CheckDate)

I assume that you only care about the minimum value for each ID for the check-in Date. That way if you select the minimum value, you get rid of any problems with multiple entries with the same ID.

df2 <- df2 %>% group_by(ID) %>% summarize(CheckDate = min(CheckDate))
# Now join the 2 dataframes together
df1 <- left_join(df1,df2) # this joins by ID by default

Finally, we make the column with Y/N

df1 <- df1 %>% mutate(within_7days = (CheckDate <= Date.Discharged + days(7))*1)

df1


  ID Date.Discharged  CheckDate within_7days
1  1      2014-08-04 2014-08-11            1
2  2      2014-12-10 2014-12-14            1
3  3      2015-01-01 2015-01-05            1

EDIT:

This is slightly inefficient, but will handle the case where you have values in the 2nd dataset that occur before the discharge date:

We can ignore the following line above:

df2 <- df2 %>% group_by(ID) %>% summarize(CheckDate = min(CheckDate))

and instead go straight to the join step and modify it as follows:

df1 <- left_join(df1,df2) %>% filter(CheckDate > Date.Discharged)

This will add all of the CheckDates that take place each Date Discharged for the same ID. This is again assuming that the ID's in DF1 are unique.

Michael Gao
  • 125
  • 6
  • Hmm the problem with selecting the minimum value is this: For ID#1 in df1 there maybe a date of 2014-08-04 but in df2 there maybe check-in dates of 2013-01-01, 2014-08-07, etc. By choosing the minimum we ignore the fact that there could be dates before the one we're concerned about in df1. – AlternativeHacks Apr 24 '16 at 22:38