0

I am attempting to conduct emotional sentiment analysis of a large corpus of Tweets (91k) with an external list of emotionally-charged words (from the NRC Emotion Lexicon). To do this, I want to run a count and sum the total number of times any word from the words of joy list is contained within each Tweet. Ideally, this would not be a partial match of the word and not exact match. I would like for the total total to show in a new column in the df.

The df and column name for the Tweets are Tweets_with_Emotions$full_text and the list is Words_of_joy$word.

Example 1

> head(Tweets_with_Emotions, n=10)
  ID       Date      full_text
1  58150 2012-09-12  I love an excellent cookie 
2  12357 2012-09-28  Oranges are delicious and excellent
3  50788 2012-10-04  Eager to visit Disneyland 
4  66038 2012-10-11  I wish my boyfriend would propose already
5  18119 2012-10-11  Love Maggie Smith
6  48349 2012-10-14  The movie was excellent, loved it.
7  23328 2012-10-16  Pineapples are so delicious and excellent
8  66038 2012-10-26  Eager to see the Champions Cup next week
9  32717 2012-10-28  Hating this show
10 11345 2012-11-08  Eager for the food

Example 2

>    > head(words_of_joy, n=5)
    word
1   eager
2   champion
3   delicious
4   excellent
5   love

Desired output

> head(New_df, n=10)
  ID       Date      full_text                                     joy_count
1  58150 2012-09-12  I love an excellent cookie                    2
2  12357 2012-09-28  Oranges are delicious and excellent           2
3  50788 2012-10-04  Eager to visit Disneyland                     1
4  66038 2012-10-11  I wish my boyfriend would propose already     0
5  18119 2012-10-11  Love Maggie Smith                             1
6  48349 2012-10-14  The movie was excellent, loved it.            2 
7  23328 2012-10-16  Pineapples are so delicious and excellent     2
8  66038 2012-10-26  Eager to see the Champions Cup next week      2
9  32717 2012-10-28  Hating this show                              0
10 11345 2012-11-08  Eager for the food                            1

I've effectively run the emotion list through the Tweets so that it returns a yes or no as to whether any words from the emotion list are contained within the Tweets (no = 0, yes = 1), however I cannot figure out how to count and return the totals in a new column

new_df <- Tweets_with_Emotions[stringr::str_detect(Tweets_with_Emotions$full_text, paste(Words_of_negative$words,collapse = '|')),]

I'm extremely new to R (and stackoverflow!) and have been struggling to figure this out for a few days so any help would be incredibly appreciated!

Katie
  • 1
  • 2

0 Answers0