I have two dataframes. Both have variables "ID" (a grouping variable), "Timepoint" (class Date
), and a value variable "Values". Some example data:
df1 <- data.frame(
ID = c(1,1,1,1,2,2,3,3),
Timepoint = as.Date(c("2019-05-21", "2019-05-23", "2019-05-26", "2019-05-27", "2018-07-03", "2018-08-03", "2018-06-04", "2018-06-10")),
Values = c(NA, 23, NA, NA, NA, 35, 23, 42)
)
ID Timepoint Values
1 1 2019-05-21 NA
2 1 2019-05-23 23
3 1 2019-05-26 NA
4 1 2019-05-27 NA
5 2 2018-07-03 NA
6 2 2018-08-03 35
7 3 2018-06-04 23
8 3 2018-06-10 42
df2 <- data.frame(
ID = c(1,1,2,2,3),
Timepoint = as.Date(c("2019-05-20", "2019-05-24", "2018-09-03", "2018-06-04", "2018-06-10")),
Values = c(11, 30, 35, 23, 42))
ID Timepoint Values
1 1 2019-05-20 11
2 1 2019-05-24 30
3 2 2018-09-03 35
4 2 2018-06-04 23
5 3 2018-06-10 42
In "df1", the value variable has some missing values NA
. I want to replace the NA
"Values" in "df1" with values from "df2", which has the same "ID" and closest date ("Timepoint").
My desired result:
ID Timepoint Values
1 1 2019-05-21 11
2 1 2019-05-23 23
3 1 2019-05-26 30
4 1 2019-05-27 30
5 2 2018-07-03 23
6 2 2018-08-03 35
7 3 2018-06-04 23
8 3 2018-06-10 42
Note that for ID 1, the 30 has been used twice to fill in the missing value as both were closest to 2019-05-24.
I tried joining using roll = "nearest"
from data.table
. But from what I understand, each value from df2 was only being used once.
I'd preferably use lubridate, tidyverse. But open to other packages if needed.