0

I am a noob in R, and I been trying to compare two data frames which is derived using Text mining and it has two columns, one with words and other with count. Assume they are dataframe1 and dataframe2.

I am trying to find out how to write the code which will select those words are present in dataframe2 but not present in dataframe1.

If we had to use it in excel, we would just use word as reference in dataframe2 and VLOOKUP the same list of words from dataframe1 and select the #N/A which are there and then sort the #N/A based on the highest count.

Below is the picture to explain in detail: dataframe1

enter image description here

dataframe2:

enter image description here

As you can see the word C & F are in dataframe1 and also in dataframe2. So we have to exclude this and it should look like this.

Expected Output:

enter image description here

Can someone help me? I been trying for hours now. Thanks in advance.

Mr Pool
  • 218
  • 1
  • 8
  • By the way, it would be easier to answer your question (and your question would be more accessible) if you include the data in a *text-based* format (e.g. in a code block) rather than as images ... – Ben Bolker Apr 23 '21 at 01:04

1 Answers1

2

There's a dplyr function to do this called anti_join:

library(dplyr)
anti_join(df1, df2, by = c('Check'))

To sort it in descending order of Count (thanks to Ben Bolker for pointing out that part of the question) you can use arrange.

library(dplyr)
df1 %>% 
anti_join(df2, by = c('Check')) %>%
arrange(desc(Count))
Elle
  • 998
  • 7
  • 12