How to compare two data frames and drop those rows appearing in 2nd data frame "knock out list" using dplyr()?

Question

I'd like to stick with dplyr() if possible (I'm a major fan), or base R is fine too if the solution is simple.

Suppose you have two data frames, as shown below. I'd like to create a new data frame that compares the two (df1 and df2), and only shows those complete rows from df1 whose ID doesn't appear in the ID's shown in df2. df2 serves as the "knock out" list. How would this be done?

df1 <- data.frame(
  ID = c(1,2,3,4,5),
  Value = c(10,20,30,40,50)
)

df2 <- data.frame(
  ID = c(6,7,8,1,2),
  Value = c(60,70,80,10,20)
)

The new data frame, call it df3, would look like this after applying the df2 "knock outs", when run in the R studio console:

ID's 1 and 2 got knocked out because they appear in both df1 and df2.

score 2 · Accepted Answer · answered Dec 30 '21 at 10:20

2

This could be achieved via an anti_join:

dplyr::anti_join(df1, df2, by = "ID")
#>   ID Value
#> 1  3    30
#> 2  4    40
#> 3  5    50

answered Dec 30 '21 at 10:20

stefan

90,330
6
25
51

I knew dplyr() would have an elegant function for this! And I won't forget a name like "anti_join"! Quick question: does putting "dplyr::" at the beginning help in any way, other than denoting that this is a dplyr function? – Curious Jorge - user9788072 Dec 30 '21 at 12:39
1

It depends. (: Here I simply used it for brevity. Personally I use the `::` notation in cases where I only need one or two functions from a package or when I need a function only once in a script or when I use functions inside my own packages or when I want to avoid nameclashes . In general when using `::` you have a small loss in performance. However, attaching a whole package also involves some one-time overhead. Hence there is a trade-off. – stefan Jan 02 '22 at 09:51

score 2 · Answer 2 · answered Dec 30 '21 at 10:24

2

As I mentioned in the comments. The answer of stefan is best. Here is an alternative with Base R:

df1[!(df1$ID %in% df2$ID),]

benchmark:

answered Dec 30 '21 at 10:24

TarJae

72,363
6
19
66

How to compare two data frames and drop those rows appearing in 2nd data frame "knock out list" using dplyr()?

2 Answers2