Given a data with missing values, imputation is a process where the missing values are substituted with some values. The goal is to ignore the rows with missing values, denoted with NAs. Such row could be seen as a component of the data hence the process called item imputation.
Input
df1 <- data.frame(ID=c(1,2,5,6),V1=c(7,77,777,NA))
df2 <- data.frame(ID=c(1,3,5,6),V2=c(6,66,666,6666))
df3 <- data.frame(ID=c(1,3,5,6),V3=c(9,NA,999,9999))
or alternatively in CSV format where misings values are marked with NAs
data.csv data2.csv data3.csv
ID V1 ID V2 ID V3
1 7 1 6 1 9
2 77 2 NA 2 NA
3 NA 3 66 3 NA
4 NA 4 NA 4 NA
5 777 5 666 5 999
6 NA 6 6666 6 9999
Output
Expected result is
ID V1 V2 V3
1 7 6 9
5 777 666 999
where we we wanted just lines without any NA value.
How to merge the input data with columns V1, V2, V3 and a common column ID with no NA on a row?
Example solution with SQLDF to merge the columns with common ID and no NA
library(sqldf)
# Read in the data: with CSV, you can use read.csv or fread from data.table
df1 <- data.frame(ID=c(1,2,5,6),V1=c(7,77,777,NA))
df2 <- data.frame(ID=c(1,3,5,6),V2=c(6,66,666,6666))
df3 <- data.frame(ID=c(1,3,5,6),V3=c(9,NA,999,9999))
#
sqldf("SELECT a.ID, a.V1, b.V2, c.V3 FROM df1 a, df2 b, df3 c WHERE a.ID=b.ID AND b.ID=c.ID AND V1!='NA'")
resulting to
ID V1 V2 V3
1 1 7 6 9
2 5 777 666 999