I am attempting to conduct emotional sentiment analysis of a large corpus of Tweets (91k) with an external list of emotionally-charged words (from the NRC Emotion Lexicon). To do this, I want to run a count and sum the total number of times any word from the words of joy list is contained within each Tweet. Ideally, this would not be a partial match of the word and not exact match. I would like for the total total to show in a new column in the df.
The df and column name for the Tweets are Tweets_with_Emotions$full_text
and the list is Words_of_joy$word
.
Example 1
> head(Tweets_with_Emotions, n=10)
ID Date full_text
1 58150 2012-09-12 I love an excellent cookie
2 12357 2012-09-28 Oranges are delicious and excellent
3 50788 2012-10-04 Eager to visit Disneyland
4 66038 2012-10-11 I wish my boyfriend would propose already
5 18119 2012-10-11 Love Maggie Smith
6 48349 2012-10-14 The movie was excellent, loved it.
7 23328 2012-10-16 Pineapples are so delicious and excellent
8 66038 2012-10-26 Eager to see the Champions Cup next week
9 32717 2012-10-28 Hating this show
10 11345 2012-11-08 Eager for the food
Example 2
> > head(words_of_joy, n=5)
word
1 eager
2 champion
3 delicious
4 excellent
5 love
Desired output
> head(New_df, n=10)
ID Date full_text joy_count
1 58150 2012-09-12 I love an excellent cookie 2
2 12357 2012-09-28 Oranges are delicious and excellent 2
3 50788 2012-10-04 Eager to visit Disneyland 1
4 66038 2012-10-11 I wish my boyfriend would propose already 0
5 18119 2012-10-11 Love Maggie Smith 1
6 48349 2012-10-14 The movie was excellent, loved it. 2
7 23328 2012-10-16 Pineapples are so delicious and excellent 2
8 66038 2012-10-26 Eager to see the Champions Cup next week 2
9 32717 2012-10-28 Hating this show 0
10 11345 2012-11-08 Eager for the food 1
I've effectively run the emotion list through the Tweets so that it returns a yes or no as to whether any words from the emotion list are contained within the Tweets (no = 0, yes = 1), however I cannot figure out how to count and return the totals in a new column
new_df <- Tweets_with_Emotions[stringr::str_detect(Tweets_with_Emotions$full_text, paste(Words_of_negative$words,collapse = '|')),]
I'm extremely new to R (and stackoverflow!) and have been struggling to figure this out for a few days so any help would be incredibly appreciated!