My question is similar to this one. Let's say I have two dataframes as shown below:
set.seed(5)
df1 <- data.frame(date=as.Date(c('2001-01-01', '2001-02-01', '2001-03-01', '2001-04-01', '2001-05-01')),
val1=rnorm(5), val2=c(rnorm(3), NA, NA), val3=c(rnorm(3), NA, NA))
df2 <- data.frame(date=as.Date(c('2001-03-01', '2001-04-01', '2001-05-01')),
val2=rnorm(3), val3=rnorm(3))
df1
date val1 val2 val3
1 2001-01-01 -0.84085548 -0.6029080 -0.2857736
2 2001-02-01 1.38435934 -0.4721664 0.1381082
3 2001-03-01 -1.25549186 -0.6353713 1.2276303
4 2001-04-01 0.07014277 NA NA
5 2001-05-01 1.71144087 NA NA
df2
date val2 val3
1 2001-03-01 -0.8017795 -1.0717600
2 2001-04-01 -1.0803926 -0.1389861
3 2001-05-01 -0.1575344 -0.5973131
df1
has some missing values in the columns val2
and val3
. I want to replace only the missing values in those columns with correct match, match according to date
from df2
. From the question that I linked to, here's a potential solution using data.table
:
library(data.table)
setDT(df1)
setDT(df2)
df1[i = df2, val2 := i.val2, on = .(date)]
df1
date val1 val2 val3
1: 2001-01-01 -0.84085548 -0.6029080 -0.2857736
2: 2001-02-01 1.38435934 -0.4721664 0.1381082
3: 2001-03-01 -1.25549186 -0.8017795 1.2276303
4: 2001-04-01 0.07014277 -1.0803926 NA
5: 2001-05-01 1.71144087 -0.1575344 NA
The problem here is that it has replaced the 3rd value of val2
in df1
with the matching value in df2
, even though the 3rd value in df1
was not originally missing. Also, how can I fill all missing values in df1
with their matching values in df2
instead of doing it one column at a time?