Match and count total words from an external list with text strings (tweets) in r

Question

I am attempting to conduct emotional sentiment analysis of a large corpus of Tweets (91k) with an external list of emotionally-charged words (from the NRC Emotion Lexicon). To do this, I want to run a count and sum the total number of times any word from the words of joy list is contained within each Tweet. Ideally, this would not be a partial match of the word and not exact match. I would like for the total total to show in a new column in the df.

The df and column name for the Tweets are Tweets_with_Emotions$full_text and the list is Words_of_joy$word.

Example 1

> head(Tweets_with_Emotions, n=10)
  ID       Date      full_text
1  58150 2012-09-12  I love an excellent cookie 
2  12357 2012-09-28  Oranges are delicious and excellent
3  50788 2012-10-04  Eager to visit Disneyland 
4  66038 2012-10-11  I wish my boyfriend would propose already
5  18119 2012-10-11  Love Maggie Smith
6  48349 2012-10-14  The movie was excellent, loved it.
7  23328 2012-10-16  Pineapples are so delicious and excellent
8  66038 2012-10-26  Eager to see the Champions Cup next week
9  32717 2012-10-28  Hating this show
10 11345 2012-11-08  Eager for the food

Example 2

>    > head(words_of_joy, n=5)
    word
1   eager
2   champion
3   delicious
4   excellent
5   love

Desired output

> head(New_df, n=10)
  ID       Date      full_text                                     joy_count
1  58150 2012-09-12  I love an excellent cookie                    2
2  12357 2012-09-28  Oranges are delicious and excellent           2
3  50788 2012-10-04  Eager to visit Disneyland                     1
4  66038 2012-10-11  I wish my boyfriend would propose already     0
5  18119 2012-10-11  Love Maggie Smith                             1
6  48349 2012-10-14  The movie was excellent, loved it.            2 
7  23328 2012-10-16  Pineapples are so delicious and excellent     2
8  66038 2012-10-26  Eager to see the Champions Cup next week      2
9  32717 2012-10-28  Hating this show                              0
10 11345 2012-11-08  Eager for the food                            1

I've effectively run the emotion list through the Tweets so that it returns a yes or no as to whether any words from the emotion list are contained within the Tweets (no = 0, yes = 1), however I cannot figure out how to count and return the totals in a new column

new_df <- Tweets_with_Emotions[stringr::str_detect(Tweets_with_Emotions$full_text, paste(Words_of_negative$words,collapse = '|')),]

I'm extremely new to R (and stackoverflow!) and have been struggling to figure this out for a few days so any help would be incredibly appreciated!

Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — NelsonGon, Jun 08 '19 at 16:45
@NelsonGon Apologies for the insufficient post. I hope you find this comprehensive now. — Katie, Jun 08 '19 at 18:58
Hi Katie! You may found useful the book on this domain, [Text Mining with R](https://www.tidytextmining.com/sentiment.html). It is easy to read and apply. — Pavel Filatov, Jun 08 '19 at 20:10
Merge the two data frames by date and count based on word. Use `str_count`. — NelsonGon, Jun 09 '19 at 04:02

Match and count total words from an external list with text strings (tweets) in r

0 Answers0