Replace NA with values from another data frame, nearest in time, by group

Question

I have two dataframes. Both have variables "ID" (a grouping variable), "Timepoint" (class Date), and a value variable "Values". Some example data:

df1 <- data.frame(
  ID = c(1,1,1,1,2,2,3,3),
  Timepoint = as.Date(c("2019-05-21", "2019-05-23", "2019-05-26", "2019-05-27", "2018-07-03", "2018-08-03", "2018-06-04", "2018-06-10")),
  Values = c(NA, 23, NA, NA, NA, 35, 23, 42)
  )

  ID  Timepoint Values
1  1 2019-05-21     NA
2  1 2019-05-23     23
3  1 2019-05-26     NA
4  1 2019-05-27     NA
5  2 2018-07-03     NA
6  2 2018-08-03     35
7  3 2018-06-04     23
8  3 2018-06-10     42

df2 <- data.frame(
  ID = c(1,1,2,2,3),
  Timepoint = as.Date(c("2019-05-20", "2019-05-24", "2018-09-03", "2018-06-04", "2018-06-10")),
  Values = c(11, 30, 35, 23, 42))

  ID  Timepoint Values
1  1 2019-05-20     11
2  1 2019-05-24     30
3  2 2018-09-03     35
4  2 2018-06-04     23
5  3 2018-06-10     42

In "df1", the value variable has some missing values NA. I want to replace the NA "Values" in "df1" with values from "df2", which has the same "ID" and closest date ("Timepoint").

My desired result:

  ID  Timepoint Values
1  1 2019-05-21     11
2  1 2019-05-23     23
3  1 2019-05-26     30
4  1 2019-05-27     30
5  2 2018-07-03     23
6  2 2018-08-03     35
7  3 2018-06-04     23
8  3 2018-06-10     42

Note that for ID 1, the 30 has been used twice to fill in the missing value as both were closest to 2019-05-24.

I tried joining using roll = "nearest" from data.table. But from what I understand, each value from df2 was only being used once.

I'd preferably use lubridate, tidyverse. But open to other packages if needed.

`df1[is.na(Values), Values := df2[.SD, on = .(ID, Timepoint), roll = "nearest", x.Values]]`, seems to give the desired result. See e.g. [Find time to nearest occurrence of particular value for each row](https://stackoverflow.com/a/42382399/1851712) — Henrik, Apr 27 '23 at 16:14
Does this answer your question? [How to join two dataframes by nearest time-date?](https://stackoverflow.com/questions/39282749/how-to-join-two-dataframes-by-nearest-time-date) — I_O, Apr 27 '23 at 16:16
@I_O, I tried this solution, but I somehow lost some rows? Perhaps I did something wrong. But it did give an interesting results that I'll have to remember! — DVNST, Apr 28 '23 at 08:02
@Henrik, this solution is similar to what chsoul suggested, and it works, so thanks! — DVNST, Apr 28 '23 at 08:03

score 0 · Accepted Answer · answered Apr 27 '23 at 16:42

Here's a solution using data.table

library(data.table)

df1 <- data.table(
  ID = c(1,1,1,1,2,2,3,3),
  Timepoint = as.Date(c("2019-05-21", "2019-05-23", "2019-05-26", "2019-05-27", "2018-07-03", "2018-08-03", "2018-06-04", "2018-06-10")),
  Values = c(NA, 23, NA, NA, NA, 35, 23, 42)
)
df2 <- data.table(
  ID = c(1,1,2,2,3),
  Timepoint = as.Date(c("2019-05-20", "2019-05-24", "2018-09-03", "2018-06-04", "2018-06-10")),
  Values = c(11, 30, 35, 23, 42))

na_rows <- which(is.na(df1$Values))
df1[na_rows, Values := df2[ df1[na_rows,], x.Values, on = .(ID, Timepoint), roll = Inf ] ]
print(df1)

Hope this helps!

Scratch that, missed the fact that your grouping by ID as well. In that case, add ID to the on argument, edited to be correct in the code above.

This has worked! I tried with `Inf` and `"nearest"`, which both worked how I expected it to. Thanks! — DVNST, Apr 28 '23 at 07:31

Replace NA with values from another data frame, nearest in time, by group

1 Answers1