0

When performing a left outer join, how do I find the elements from the right table that didn't find a match in the left table? Do you just do a right outer join and discard elements with a match to be left with the elements from the right table without a match?

That seems like a sketchy solution, so is there a more elegant way of finding the orphaned elements in the right table?

thymaro
  • 134
  • 8
  • 1
    After a left join, there is no immediate way to find the un-matched elements. You can use `anti_join` to find what isn't found, in a separate merge operation. – r2evans Jun 20 '20 at 23:15
  • 1
    OR you can do a `full_join` and find the elements that have missing data in the left-data's fields. – r2evans Jun 20 '20 at 23:15

2 Answers2

2

Once a left_join is done, you cannot find out what wasn't matched. As Petr suggested in that answer, you can subsequently use anti_join to find what doesn't match.

Another technique (that only requires one merge operation) is to do a full join and filter on elements unique to the left and to the right to see what is missing.

Using datasets used in the examples of full_join:

full_join(band_members, band_instruments)
# Joining, by = "name"
# # A tibble: 4 x 3
#   name  band    plays 
#   <chr> <chr>   <chr> 
# 1 Mick  Stones  <NA>  
# 2 John  Beatles guitar
# 3 Paul  Beatles bass  
# 4 Keith <NA>    guitar

In this example, one can approximate the left-join with filter(!is.na(band)) and right-join with filter(!is.na(plays)), and finally one can get the second frame's unmatched elements with filter(is.na(plays)).

In this example, it's "clear" since there were no NA values before the merge. If there is no column that is known to never be NA (in either or both frames), then you can add one with low-cost. For instance mutate(band_members, orig=TRUE) (and same for band_instruments) will give you solid "known" columns.

r2evans
  • 141,215
  • 6
  • 77
  • 149
1

You are probably looking for anti_join() in this way:

right_table %>% anti_join(left_table)

This filters only rows in the right_table, which don't match any row in the left_table.