Fill only missing values in dataframe with values from another dataframe

Question

My question is similar to this one. Let's say I have two dataframes as shown below:

set.seed(5)
df1 <- data.frame(date=as.Date(c('2001-01-01', '2001-02-01', '2001-03-01', '2001-04-01', '2001-05-01')),
                  val1=rnorm(5), val2=c(rnorm(3), NA, NA), val3=c(rnorm(3), NA, NA))

df2 <- data.frame(date=as.Date(c('2001-03-01', '2001-04-01', '2001-05-01')),
                  val2=rnorm(3), val3=rnorm(3))
df1
        date        val1       val2       val3
1 2001-01-01 -0.84085548 -0.6029080 -0.2857736
2 2001-02-01  1.38435934 -0.4721664  0.1381082
3 2001-03-01 -1.25549186 -0.6353713  1.2276303
4 2001-04-01  0.07014277         NA         NA
5 2001-05-01  1.71144087         NA         NA

df2
        date       val2       val3
1 2001-03-01 -0.8017795 -1.0717600
2 2001-04-01 -1.0803926 -0.1389861
3 2001-05-01 -0.1575344 -0.5973131

df1 has some missing values in the columns val2 and val3. I want to replace only the missing values in those columns with correct match, match according to date from df2. From the question that I linked to, here's a potential solution using data.table:

library(data.table)
setDT(df1)
setDT(df2)
df1[i = df2, val2 := i.val2, on = .(date)]
df1
         date        val1       val2       val3
1: 2001-01-01 -0.84085548 -0.6029080 -0.2857736
2: 2001-02-01  1.38435934 -0.4721664  0.1381082
3: 2001-03-01 -1.25549186 -0.8017795  1.2276303
4: 2001-04-01  0.07014277 -1.0803926         NA
5: 2001-05-01  1.71144087 -0.1575344         NA

The problem here is that it has replaced the 3rd value of val2 in df1 with the matching value in df2, even though the 3rd value in df1 was not originally missing. Also, how can I fill all missing values in df1 with their matching values in df2 instead of doing it one column at a time?

Gopala · Accepted Answer · 2018-05-17T14:13:14.547

Here is one solution using dplyr package:

df1 <- df1 %>%
  left_join(., df2, by = 'date') %>%
  mutate(
    val2 = ifelse(is.na(val2.x), val2.y, val2.x),
    val3 = ifelse(is.na(val3.x), val3.y, val3.x)
    ) %>%
  select(date, val1, val2, val3)

df1
        date        val1       val2       val3
1 2001-01-01 -0.84085548 -0.6029080 -0.2857736
2 2001-02-01  1.38435934 -0.4721664  0.1381082
3 2001-03-01 -1.25549186 -0.6353713  1.2276303
4 2001-04-01  0.07014277 -1.0803926 -0.1389861
5 2001-05-01  1.71144087 -0.1575344 -0.5973131

Finally with the seed set. Damn, I kept getting different results! :)

With the coalesce option correctly pointed out below:

df1 %>%
  left_join(., df2, by = 'date') %>%
  mutate(
    val2 = coalesce(val2.x, val2.y),
    val3 = coalesce(val3.x, val3.y)
    ) %>%
  select(date, val1, val2, val3)

Cleaner than ifelse, for sure.

Fill only missing values in dataframe with values from another dataframe

1 Answers1