2

I'm trying to merge 2 datasets on a key, but if there is no match then I want to try another key, and so on.

df1 <- data.frame(a=c(5,1,7,3),
              b=c("T","T","T","F"),
              c=c("F","T","F","F"))


df2 <- data.frame(x1=c(4,5,3,9), 
                  x2=c(7,8,1,2),
                  x3=c("g","w","t","o"))
df1
   a  b  c
1  5  T  F
2  1  T  T
3  7  T  F
4  3  F  F

df2
   x1 x2 x3 ..
1  4  7  g  ..
2  5  8  w  ..
3  3  1  t  ..
4  9  2  o  ..

The desired output is something like

   a  b  c x3  ..
1  5  T  F  w  ..
2  1  T  T  t  ..
3  7  T  F  g  ..
4  3  F  F  t  ..

I tried something along the lines of

dfm <- merge(df1,df2, by.x = "a", by.y = "x1", all.x = TRUE)
dfm <- merge(dfm,df2, by.x = "a", by.y = "x2", all.x = TRUE)

but that isn't quite right.

Uwe
  • 41,420
  • 11
  • 90
  • 134
Alan Dursun
  • 655
  • 3
  • 14
  • Please check it https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – foc Oct 03 '18 at 20:11
  • 1
    Welcome to SO! Unfortunately, there is a discrepancy between the title of your question and the sample data and expected result. The question is titled _Merge on x1, or if no match x2, or if no match x3_ but column `x3` in the given sample dataset is not used for matching (and cannot be used as it has the wrong type). – Uwe Oct 04 '18 at 06:13
  • Ah, you are right. The sample data I provided does not use x3 as an addition key to try and match. Fortunately, the answers provided scale perfectly to x3, x4, ect. Thanks for the clarification. – Alan Dursun Oct 04 '18 at 13:54

4 Answers4

3

This really isn't a standard sort of merge. You can make it more standard by reshaping df2 so you have just one field to merge on

df2long <- rbind(
    data.frame(a = df2$x1, df2[,-(1:2), drop=FALSE]), 
    data.frame(a = df2$x2, df2[,-(1:2), drop=FALSE])
)
dfm <- merge(df1, df2long, by = "a", all.x = TRUE)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Very concise solution to the problem. Thanks so much for your quick response! – Alan Dursun Oct 03 '18 at 20:52
  • 1
    This answer returns the expected result only for the given sample datasets. But it will return duplicate entries if there is a match in both columns `x1` and `x2`. Please, see the enhanced datasets [here](https://stackoverflow.com/a/52636778/3817004). – Uwe Oct 04 '18 at 05:48
  • That's a good catch, @Uwe. A quick fix for removing duplicates is a simple `df[!duplicated(df),]`. – Alan Dursun Oct 09 '18 at 22:10
3

You could do something like this:

matches <- lapply(df2[, c("x1", "x2")], function(x) match(df1$a, x))
# finding matches in df2$x1 and df2$x2
# notice that the code below should work with any number of columns to be matched:
# you just need to add the names here eg. df2[, paste0("x", 1:100)] 
matches
$x1
[1]  2 NA NA  3

$x2
[1] NA  3  1 NA
combo <- Reduce(function(a,b) "[<-"(a, is.na(a), b[is.na(a)]), matches)
# combining the matches on "first come first served" basis
combo
[1] 2 3 1 3
cbind(df1, df2[combo,])
    a b c x1 x2 x3
2   5 T F  5  8  w
3   1 T T  3  1  t
1   7 T F  4  7  g
3.1 3 F F  3  1  t
Uwe
  • 41,420
  • 11
  • 90
  • 134
lebatsnok
  • 6,329
  • 2
  • 21
  • 22
  • Since OP's desired output doesn't have x1, x2, could do `cbind(df1, df2[combo, -which(names(df2) %in% c("x1", "x2")), drop=FALSE])` if that matters. – Frank Oct 03 '18 at 20:28
  • Clever approach to the problem, definitely helpful in seeing your thought process to arrive at the solution. Many thanks! – Alan Dursun Oct 03 '18 at 20:53
  • 1
    This approach returns the expected result also with the more complete use cases [here](https://stackoverflow.com/a/52636778/3817004). Furthermore, precedence can be controlled. – Uwe Oct 04 '18 at 06:28
2

If I understand correctly, the OP has requested to try a match of a with x1 first, then - if failed - to try to match a with x2. So any match of a with x1 should take precedence over a match of a with x2.

Unfortunately, the sample data set provided by the OP does not include a use case to prove this. Therefore, I have modified the sample dataset accordingly (see Data section).

The approach suggested here is to reshape df2 from wide to long format (likewise to MrFlick's answer) but to use a data.table join with parameter mult = "first".

The columns of df2 to be considered as key columns and the precedence can be controlled by the measure.vars parameter to melt(). After reshaping, melt() arranges the rows in the column order given in measure.vars:

library(data.table)
# define cols of df2 to use as key in order of 
key_cols <- c("x1", "x2")
# reshape df2 from wide to long format
long <- melt(setDT(df2), measure.vars = key_cols, value.name = "a")
# join long with df1, pick first matches
result <- long[setDT(df1), on = "a", mult = "first"]
# clean up
setcolorder(result, names(df1))
result[, variable := NULL]
result
   a b c   x3
1: 5 T F    w
2: 1 T T    t
3: 7 T F    g
4: 3 F F    t
5: 0 F F <NA>

Please, note that the original row order of df1 has been preserved.

Also, note that the code works for an arbitrary number of key columns. The precedence of key columns can be easily changed. E.g., if the order is reversed, i.e., key_cols <- c("x2", "x1") matches of a with x2 will be picked first.

Data

Enhanced sample datasets:

df1 has an additional row with no match in df2.

df1 <- data.frame(a=c(5,1,7,3,0),
                  b=c("T","T","T","F","F"),
                  c=c("F","T","F","F","F"))
df1
   a b c
1: 5 T F
2: 1 T T
3: 7 T F
4: 3 F F
5: 0 F F

df2 has an additional row to prove that a match in x1 takes precedence over a match in x2. The value 5 appears twice: In row 2 of column x1 and in row 5 of column x2.

df2 <- data.frame(x1=c(4,5,3,9,6), 
                  x2=c(7,8,1,2,5),
                  x3=c("g","w","t","o","n"))
df2
   x1 x2 x3
1:  4  7  g
2:  5  8  w
3:  3  1  t
4:  9  2  o
5:  6  5  n
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
0

Not sure I understood your question, but rather than repetitive merging I'd compare the keys of the potential merge, if this number is >0, than you have a match. If you want to take the first column with a match you can try this:

    library(tidyr)
    library(purrr)
    (df1 <- data.frame(a=c(5,1,7,3),
          b=c("T","T","T","F"),
          c=c("F","T","F","F")) )
    (df2 <- data.frame(x1=c(4,5,3,9), 
              x2=c(7,8,1,2),
              x3=c("g","w","t","o")) )

     FirstColMatch<-1:ncol(df2) %>% 
         map(~intersect(df1$a, df2[[.x]])) %>% 
         map(length)  %>%
         detect_index(function(x)x>0)

     NewDF<-merge(df1,df2,by.x="a", by.y =names(df2)[FirstColMatch])
Barum
  • 21
  • 3