Using Anti Join in R

Question

I am a noob in R, and I been trying to compare two data frames which is derived using Text mining and it has two columns, one with words and other with count. Assume they are dataframe1 and dataframe2.

I am trying to find out how to write the code which will select those words are present in dataframe2 but not present in dataframe1.

If we had to use it in excel, we would just use word as reference in dataframe2 and VLOOKUP the same list of words from dataframe1 and select the #N/A which are there and then sort the #N/A based on the highest count.

Below is the picture to explain in detail: dataframe1

dataframe2:

As you can see the word C & F are in dataframe1 and also in dataframe2. So we have to exclude this and it should look like this.

Expected Output:

Can someone help me? I been trying for hours now. Thanks in advance.

By the way, it would be easier to answer your question (and your question would be more accessible) if you include the data in a *text-based* format (e.g. in a code block) rather than as images ... — Ben Bolker, Apr 23 '21 at 01:04

Elle · Accepted Answer · 2021-04-23T01:02:47.580

2

There's a dplyr function to do this called anti_join:

library(dplyr)
anti_join(df1, df2, by = c('Check'))

To sort it in descending order of Count (thanks to Ben Bolker for pointing out that part of the question) you can use arrange.

library(dplyr)
df1 %>% 
anti_join(df2, by = c('Check')) %>%
arrange(desc(Count))

edited Apr 23 '21 at 01:02

answered Apr 23 '21 at 00:53

Elle

998
7
12

Thanks, didn't see that part. Will edit it in. – Elle Apr 23 '21 at 00:59

Using Anti Join in R

1 Answers1