I have two data frames that I am trying to join by date (grouped by individual).
I have made example data frames of both (the real df1 is 5700 rows, and the real df2 is 287 rows).
df1 has IDs (including some not in df2), dates, and behavior values.
df2 has IDs (though fewer than df1), dates (fewer than those in df1), and hormone values.
My goal is to match the hormones for a given individual from the nearest date in df2 to the nearest date in df1 (matching as closely as possible but only duplicating the values of hormones from df2 in df1 when the nearest dates are less than or equal to 2 days apart).
I would like to have the hormones that don't match a behavioral observation printed at the bottom of the new data frame with their date such that they aren't lost (example in df3)
df1
ID Date behavior
a 1-12-2020 0
b 1-12-2020 1
b 1-13-2020 1
c 1-12-2020 2
d 1-12-2020 0
c 1-13-2020 1
c 1-14-2020 0
c 1-15-2020 1
c 1-16-2020 2
df2
ID Date hormone
a 1-10-2020 20
b 1-18-2019 70
c 1-10-2020 80
c 1-16-2020 90
#goal dataframe
df3
ID Date behavior hormone
a 1-12-2020 0 20
b 1-12-2020 1 NA [> 2 days from hormone]
b 1-13-2020 1 NA [> 2 days from hormone]
c 1-12-2020 2 80
d 1-12-2020 0 NA [no matching individual in df2]
c 1-13-2020 1 NA [> 2 days from hormone]
c 1-14-2020 0 90
c 1-15-2020 1 90
c 1-16-2020 2 90
b 1-18-2019 NA 70 [unmatched hormone at bottom of df3]
here is the code to create these data frames:
df1 <- data.frame(ID = c("a", "b", "b", "c", "d", "c", "c","c", "c"),
date = c("1-12-2020", "1-12-2020", "1-13-2020", "1-12-2020", "1-12-2020","1-13-2020","1-14-2020","1-15-2020","1-16-2020"),
behavior = c(0,1,1,2,0,1,0,1,2) )
df2 <- data.frame(ID = c("a", "b", "c", "c"),
date = c("1-10-2020", "1-18-2019", "1-10-2020", "1-16-2020"),
hormone = c(20,70,80,90) )
df1$date<-as.factor(df1$date)
df1$date<-strptime(df1$date,format="%m-%d-%Y")
#for nearest date function to work
df1$date<-as.Date(df1$date,"%m/%d/%y")
df2$date<-as.factor(df2$date)
df2$date<-strptime(df2$date,format="%m-%d-%Y")
#for nearest date function to work
df2$date<-as.Date(df2$date,"%m/%d/%y")
I have been able to use a function from a previous question on the forum (link and code below) to match the nearest dates and duplicate to fill, but am not able to limit the time frame of matches, or print unmatched dates in new rows. Is there a way to do this?
This is what I started working from (code below): How to match by nearest date from two data frames?
# Function to get the index specifying closest or after
Ind_closest_or_after <- function(d1, d2){
which.min(ifelse(d1 - d2 < 0, Inf, d1 - d2))
}
# Calculate the indices
closest_or_after_ind <- map_int(.x = df1$date, .f = Ind_closest_or_after, d2 = df2$date)
# Add index columns to the data frames and join
df2 <- df2 %>%
mutate(ind = 1:nrow(df2))
df1 <- df1 %>%
mutate(ind = closest_or_after_ind)
df3<-left_join(df2, df1, by = 'ind')
This answer seems the closest but doesn't limit the values: Merge two data frames by nearest date and ID
#function to do all but limit dates and print unmatched
library(data.table)
setDT(df2)[, date := date]
df2[df1, on = .(ID, date = date), roll = 'nearest']