R: Counting frequency of words in a character column

Question

I'm trying to count the number of times that some pre-specified words appear in a character column, Post.

This is what my dataset looks like: Data

Now, I want to count all green/sustainable words in each of the posts and add this number as an extra column.

I have manually created a lexicon where all green words have Polarity == 1 and non-green words have Polarity == 0.

Lexicon

How can I do this?

Welcome to SO! Please post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your data using `dput()` rather than images so people can help you. — SamR, Jul 13 '22 at 13:48
It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please do not post data or code as images because we cannot easily copy/paste those values into R for testing. — MrFlick, Jul 13 '22 at 13:50
The existing answers here may also be helpful: https://stackoverflow.com/questions/7597559/grep-using-a-character-vector-with-multiple-patterns — MrFlick, Jul 13 '22 at 13:52
For future reference: [why should I not upload images of code/data?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors-when-asking-a-question/285557#285557) — Andrea M, Jul 15 '22 at 20:33

Andrea M · Answer 1 · 2022-07-15T20:24:55.700

str_count() from stringr can help with this (and with a lot more string-based tasks, see this R4DS chapter).

library(string)

# Create a reproducible example
dat <- data.frame(Post = c(
      "This is a sample post without any target words",
      "Whilst this is green!",
      "And this is eco-friendly",
      "This is green AND eco-friendly!"))
lexicon <- data.frame(Word = c("green", "eco-friendly", "neutral"),
                      Polarity = c(1, 1, 0))

# Extract relevant words from lexicon
green_words <- lexicon$Word[lexicon$Polarity == 1]

# Create new variable
dat$n_green_words <- str_count(dat$Post, paste(green_words, collapse = "|"))

dat

Output:

#>                                             Post n_green_words
#> 1 This is a sample post without any target words             0
#> 2                          Whilst this is green!             1
#> 3                       And this is eco-friendly             1
#> 4                This is green AND eco-friendly!             2

^{Created on 2022-07-15 by the reprex package (v2.0.1)}

R: Counting frequency of words in a character column

1 Answers1