0

Using the R command left_join() from dplyr, which joins a dataframe to another, it is possible to tell it to check against two columns?

My problem is that it for a large number of rows, a join by column A will return NAs since there will be no match. However, there will be a match for these rows if the dataframe was instead joined by column B.

Can I tell left_join() to first try to join by column A, and if this returns an NA, try with column B? Most will be matched by column A, but I want to "save" the remaining rows by giving them a chance to join by another column.

Thanks for your help. Appreciate it.

Will M
  • 692
  • 9
  • 20
  • 3
    Please show a small reproducible example – akrun Jan 09 '21 at 23:39
  • 1
    Maybe this will help? [Can dplyr join on multiple columns or composite key?](https://stackoverflow.com/questions/26611717/can-dplyr-join-on-multiple-columns-or-composite-key) – Ian Campbell Jan 10 '21 at 00:23
  • 1
    You can bring column A and B in one column using `pivot_longer` and then perform the join. See this example https://stackoverflow.com/questions/65638384/how-to-left-join-on-any-of-the-matching-clauses-in-r/ – Ronak Shah Jan 10 '21 at 02:42

1 Answers1

1

The easiest approach is probably to first create a new column in your second dataframe which is equal to column A if it has a value, and equal to B if not. Then join on that new column.

MattB
  • 651
  • 3
  • 11