0

I have two dataframes. Both have variables "ID" (a grouping variable), "Timepoint" (class Date), and a value variable "Values". Some example data:

df1 <- data.frame(
  ID = c(1,1,1,1,2,2,3,3),
  Timepoint = as.Date(c("2019-05-21", "2019-05-23", "2019-05-26", "2019-05-27", "2018-07-03", "2018-08-03", "2018-06-04", "2018-06-10")),
  Values = c(NA, 23, NA, NA, NA, 35, 23, 42)
  )

  ID  Timepoint Values
1  1 2019-05-21     NA
2  1 2019-05-23     23
3  1 2019-05-26     NA
4  1 2019-05-27     NA
5  2 2018-07-03     NA
6  2 2018-08-03     35
7  3 2018-06-04     23
8  3 2018-06-10     42

df2 <- data.frame(
  ID = c(1,1,2,2,3),
  Timepoint = as.Date(c("2019-05-20", "2019-05-24", "2018-09-03", "2018-06-04", "2018-06-10")),
  Values = c(11, 30, 35, 23, 42))

  ID  Timepoint Values
1  1 2019-05-20     11
2  1 2019-05-24     30
3  2 2018-09-03     35
4  2 2018-06-04     23
5  3 2018-06-10     42

In "df1", the value variable has some missing values NA. I want to replace the NA "Values" in "df1" with values from "df2", which has the same "ID" and closest date ("Timepoint").

My desired result:

  ID  Timepoint Values
1  1 2019-05-21     11
2  1 2019-05-23     23
3  1 2019-05-26     30
4  1 2019-05-27     30
5  2 2018-07-03     23
6  2 2018-08-03     35
7  3 2018-06-04     23
8  3 2018-06-10     42

Note that for ID 1, the 30 has been used twice to fill in the missing value as both were closest to 2019-05-24.

I tried joining using roll = "nearest" from data.table. But from what I understand, each value from df2 was only being used once.

I'd preferably use lubridate, tidyverse. But open to other packages if needed.

Henrik
  • 65,555
  • 14
  • 143
  • 159
DVNST
  • 3
  • 2
  • `df1[is.na(Values), Values := df2[.SD, on = .(ID, Timepoint), roll = "nearest", x.Values]]`, seems to give the desired result. See e.g. [Find time to nearest occurrence of particular value for each row](https://stackoverflow.com/a/42382399/1851712) – Henrik Apr 27 '23 at 16:14
  • 1
    Does this answer your question? [How to join two dataframes by nearest time-date?](https://stackoverflow.com/questions/39282749/how-to-join-two-dataframes-by-nearest-time-date) – I_O Apr 27 '23 at 16:16
  • @I_O, I tried this solution, but I somehow lost some rows? Perhaps I did something wrong. But it did give an interesting results that I'll have to remember! – DVNST Apr 28 '23 at 08:02
  • @Henrik, this solution is similar to what chsoul suggested, and it works, so thanks! – DVNST Apr 28 '23 at 08:03

1 Answers1

0

Here's a solution using data.table

library(data.table)

df1 <- data.table(
  ID = c(1,1,1,1,2,2,3,3),
  Timepoint = as.Date(c("2019-05-21", "2019-05-23", "2019-05-26", "2019-05-27", "2018-07-03", "2018-08-03", "2018-06-04", "2018-06-10")),
  Values = c(NA, 23, NA, NA, NA, 35, 23, 42)
)
df2 <- data.table(
  ID = c(1,1,2,2,3),
  Timepoint = as.Date(c("2019-05-20", "2019-05-24", "2018-09-03", "2018-06-04", "2018-06-10")),
  Values = c(11, 30, 35, 23, 42))

na_rows <- which(is.na(df1$Values))
df1[na_rows, Values := df2[ df1[na_rows,], x.Values, on = .(ID, Timepoint), roll = Inf ] ]
print(df1)

Hope this helps!

Scratch that, missed the fact that your grouping by ID as well. In that case, add ID to the on argument, edited to be correct in the code above.

chsoul
  • 54
  • 3
  • This has worked! I tried with `Inf` and `"nearest"`, which both worked how I expected it to. Thanks! – DVNST Apr 28 '23 at 07:31