I have a joining problem that I'm struggling with in that the join IDs I want to use for separate dataframes are spread out across three possible ID columns. I'd like to be able to join if at least one join ID matches. I know the _join and merge functions accept a vector of column names but is it possible to make this work conditionally?
For example, if I have the following two data frames:
df_A <- data.frame(dta = c("FOO", "BAR", "GOO"),
id1 = c("abc", "", "bcd"),
id2 = c("", "", "xyz"),
id3 = c("def", "fgh", ""), stringsAsFactors = F)
df_B <- data.frame(dta = c("FUU", "PAR", "KOO"),
id1 = c("abc", "", ""),
id2 = c("", "xyz", "zzz"),
id3 = c("", "", ""), stringsAsFactors = F)
> df_A
dta id1 id2 id3
1 FOO abc def
2 BAR fgh
3 GOO bcd xyz
> df_B
dta id1 id2 id3
1 FUU abc
2 PAR xyz
3 KOO zzz
I hope to end up with something like this:
dta.x dta.y id1 id2 id3
1 FOO FUU abc "" def [matched on id1]
2 BAR "" "" "" fgh [unmatched]
3 GOO PAR bcd xyz "" [matched on id2]
4 KOO "" "" zzz "" [unmatched]
So that unmatched dta1 and dta1 variables are retained but where there is a match (row 1 + 3 above) both dta1 and dta2 are joined in the new table. I have a sense that neither _join, merge, or match will work as is and that I'd need to write a function but I'm not sure where to start. Any help or ideas appreciated. Thank you