Join two dataframes and eliminate duplicates

Question

I have a dataframe were the first column is called id (1, 2, 3, 4, 5, etc.) and corresponds to a dive. Another column specifies the dive type which can be F and NF.

I have another dataframe that includes the id of all NF dives that are dubious and should be eliminated from the analysis.

How can I eliminate the rows in the first dataframe that have an id that is included in the second dataframe?

Example:

> df1

id  dive_type
 1          F
 2          F
 3         NF
 4          F
 5          F
 6          F
 7         NF
 8          F 

> df2

id  dive_type
 1          F
 2          F
 5          F
 8          F

My goal is to delete all id's in df1 that are present in df2 (in this case id's 1, 2, 5 and 8) and get something like this:

> res

id  dive_type
 3         NF
 4          F
 6          F
 7         NF

Thanks

It’s easier to help if you make your question reproducible: include a minimal dataset in the form of an object for example if a data frame as df <- data.frame(…) where … is your variables and values or use dput(head(df)). Include the code you have tried and set out your expected answer. These links should be of help: [mre] and [ask] — Peter, Jul 07 '20 at 16:12

score 0 · Answer 1 · answered Jul 07 '20 at 16:31

0

You can use %in% to check whether each id in df1 is in df2$id, and subset df1 based on the negation of that:

df1[!(df1$id %in% df2$id), ]
#>   id dive_type
#> 3  3        NF
#> 4  4         F
#> 6  6         F
#> 7  7        NF

answered Jul 07 '20 at 16:31

Allan Cameron

147,086
7
49
87

Join two dataframes and eliminate duplicates

1 Answers1