When you use a dplyr join
function like full_join
, columns with identical names are duplicated and given suffixes like "col.x", "col.y", "col.x.x", etc. when they are not used to join the tables.
library(dplyr)
data1<-data.frame(
Code=c(2,1,18,5),
Country=c("Canada", "USA", "Brazil", "Iran"),
x=c(50,29,40,29))
data2<-data.frame(
Code=c(2,40,18),
Country=c("Canada","Japan","Brazil"),
y=c(22,30,94))
data3<-data.frame(
Code=c(25,14,52),
Country=c("China","Japan","Australia"),
z=c(22,30,94))
data4<-Reduce(function(...) full_join(..., by="Code"), list(data1,data2,data3))
This results in "Country", "Country.x", and "Country.y" columns.
Is there a way to combine the three columns into one, such that if a row has NA for a "Country", it takes the value from "Country.x" or "Country.y"?
I attempted a solution based on this similar question, but it gives me a warning and returns only values from the top three rows.
data4<-Reduce(function(...) full_join(..., by="Code"), list(data1,data2,data3)) %>%
mutate(Country=coalesce(Country.x,Country.y,Country)) %>%
select(-Country.x, -Country.y)
This returns the warning invalid factor level, NA generated
.
Any ideas?