I have 2 dataframes in R: 'dfold' with 175 variables and 'dfnew' with 75 variables. The 2 datframes are matched by a primary key (that is 'pid'). dfnew is a subset of dfold, so that all variables in dfnew are also on dfold but with updated, imputed values (no NAs anymore). At the same time dfold has more variables, and I will need them in the analysis phase. I would like to merge the 2 dataframes in dfmerge so to update common variables from dfnew --> dfold but at the same time retaining pre-existing variables in dfold. I have tried merge(), match(), dplyr, and sqldf packages, but either I obtain a dfmerge with the updated 75 variables only (left join) or a dfmerge with 250 variables (old variables with NAs and new variables without them coexist). The only way I found (here) is an elegant but pretty long (10 rows) loop that is eliminating *.x variables after a merge by pid with all.x = TRUE option). Might you please advice on a more efficient way to obtain such result if available ?
Thank you in advance
P.S: To make things easier, I have created a minimal version of dfold and dfnew: dfnew has now 3 variables, no NAs, while dfold has 5 variables, NAs included. Here it is the dataframes structure
dfold:
structure(list(Country = structure(c(1L, 3L, 2L, 3L, 2L), .Label = c("France",
"Germany", "Spain"), class = "factor"), Age = c(44L, 27L, 30L,
38L, 40L), Salary = c(72000L, 48000L, 54000L, 61000L, NA), Purchased = structure(c(1L,
2L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"),
pid = 1:5), .Names = c("Country", "Age", "Salary", "Purchased",
"pid"), row.names = c(NA, 5L), class = "data.frame")
dfnew:
structure(list(Age = c(44, 27, 30), Salary = c(72000, 48000,
54000), pid = c(1, 2, 3)), .Names = c("Age", "Salary", "pid"), row.names = c(NA,
3L), class = "data.frame")
Although here the issue is limited to just 2 variables Please remind that the real scenario will involve 75 variables.