I want to merge two data frames in data.table
by the same ID. The first data frame df1 has unique IDs, the second df2 has duplicated IDs. I only want to merge IDs from df2 with the closest date to df1. Every other ID in df2 should be dropped. I was trying the nearestTimeandID
function in bayesbio
, but it takes too much time.
I tried to follow another comment suggesting the roll="nearest" in data.table
but the IDs are still duplicated.
This is the function I used in nearestTimeandDate:
library(data.table)
together <- nearestTimeandID(df1, df2, timeCol1 = "Date1", timeCol2 = "Date2", IDcol = "ID")
and with the roll = "nearest"
:
setDT(df1)[, join_date := Date1]
setDT(df2)[, join_date := Date2]
together <- df1[df2, on = .(ID, join_date), roll = "nearest"]
How can I merge the datasets and drop the duplicated IDs that are not the closest date in a time efficient way? Thanks!
sample data:
d <- as.Date("2020-07-10")
df1 <- data.table(ID=1, Date1=d, Val1=11)
df2 <- data.table(ID=rep(1, 2), Date2=c(d-3, d+1), Val2=21:22)