3

I have a list of words in a dataset .

My question: Any idea how to fix the following code?

new_words <- c("it", "apple", "carrot", "after",
 "snake")
# My first solution
removeWords(new_words,
 words = stopwords (kind = "en")
# I have a problem in the Second solution,
# because I want to use #%in% operator
new_words[new_words %in%
 words = stopwords(kind = "en")]
Ell
  • 109
  • 4
  • 2
    Remove the `words = ` part and try again. – Martin Gal Oct 17 '21 at 22:07
  • Thanks Martin, I did my best to organize my question very well. How did you fix it? Sorry first time asking a question on this website – Ell Oct 17 '21 at 22:08
  • Yes, I tried that code in that way as well, it did not work out – Ell Oct 17 '21 at 22:09
  • @Eljan, for formatting codes and questions, see https://meta.stackexchange.com/a/22189 and https://stackoverflow.com/editing-help. – r2evans Oct 17 '21 at 22:10
  • 2
    @Eljan In this case please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Martin Gal Oct 17 '21 at 22:11
  • Please check the question again, I tried to describe the question, if you have any question please let me know – Ell Oct 17 '21 at 22:19
  • `new_words[!new_words %in% stopwords::stopwords()]` should work. – Martin Gal Oct 17 '21 at 22:20
  • @Martin, I would like to follow this solution to understand. I have a dataset with over 10000 words. How can I remove the stopwords using your solution? This code did not work in my side – Ell Oct 17 '21 at 22:33
  • To be clear, it works but when I apply it for another dataset, which has over 10000 words, it does not work – Ell Oct 17 '21 at 22:34
  • Your other dataset... is it also a character vector like your example or are you using some other structure like a data.frame? – Martin Gal Oct 17 '21 at 22:38
  • So, my project is basically to scan the reviews in the website and find out top frequent words. I have scanned the webpage and I have a lot of words. It is not a character vector. Now, I am trying to remove the stopwords. – Ell Oct 17 '21 at 22:47
  • To answer your question it is necessary and important to know the structure of your data. The suggestion works for a character vector but will fail for a data.frame without small adjustments. – Martin Gal Oct 17 '21 at 22:51
  • Thank you, I finally figured out. I really appreciate your help – Ell Oct 17 '21 at 23:29

1 Answers1

1

why not just anti_join(stop_words)?

that's how Julia Silge does it in Text Mining for R... https://www.tidytextmining.com/index.html