Is there a merging function that prioritizes non-missing values from common variables?
Consider the following example.
First we generate two data.frames with the same IDs but complementary missing values on a particular varuiable:
set.seed(1)
missings <- sample.int(6, 3)
df1 <- data.frame(ID = letters[1:6], V1 = NA)
df2 <- data.frame(ID = letters[1:6], V1 = NA)
df1$V1[missings] <- rnorm(3)
df2$V1[setdiff(1:6, missings)] <- rnorm(3)
Applying merge
or any of the join
functions from the dplyr
package produces results similar to the below:
> merge(df1, df2, by = 'ID')
ID V1.x V1.y
1 a NA -1.5399500
2 b 1.3297993 NA
3 c 0.4146414 NA
4 d NA -0.9285670
5 e NA -0.2947204
6 f 1.2724293 NA
We'd like to join these two data.frames in a "smarter" way that ignores missing values in one data.frame when not missing in the other to obtain the below output:
> output <- df1
> output$V1[is.na(df1$V1)] <- df2$V1[!(is.na(df2$V1))]
> output
ID V1
1 a -1.5399500
2 b 1.3297993
3 c 0.4146414
4 d -0.9285670
5 e -0.2947204
6 f 1.2724293
We can assume that df1
and df2
have totally complementary missing values of V1
.
EDIT
A solution that would work for an arbitrary number of variables would be ideal.