0

I have a problem with this piece of code. I can't find a bug, but the results are clearly incorrect. After calling the function:


    data=sentimentfunction(My_tweettext, positive_war, negative_war, 
    .progress='text')

I get this: ss of result

The result is a df with downloaded tweets (cleaning has been done), where every second result of the sentiment function is the maximum. 3 identical tweets, we get 2x score = 0 and 1x score = 4771.

Could someone smarter than me look at this code and check it for correctness? Suggest how I can get the correct results? I want to use the "tweettext" that I have already obtained.


    sentimentfun = function(tweettext, pos, neg, .progress='non')
    {
      # Parameters
      # tweettext: vector of text to score
      # pos: vector of words of postive sentiment
      # neg: vector of words of negative sentiment
      # .progress: passed to laply() 4 control of progress bar
      
      scores = laply(tweettext,
                     function(singletweet, pos, neg)
                     {
                       singletweet = gsub("[[:punct:]]", "", singletweet)
                       singletweet = gsub("[[:cntrl:]]", "", singletweet)
                       singletweet = gsub("\\d+", "", singletweet)
    
                       tryTolower = function(x)
                       {
                         y = NA
                         try_error = tryCatch(tolower(x), error=function(e)e)
                         if (!inherits(try_error, "error"))
                           y = tolower(x)
                         return(y)
                       }
                       singletweet = sapply(singletweet, tryTolower)
                       word.list = str_split(singletweet, "\\s+")
                       words = unlist(word.list)
                       pos.matches = match(words, pos)
                       neg.matches = match(words, neg)
                       pos.matches = !is.na(pos.matches)
                       neg.matches = !is.na(neg.matches)
                       score = sum(pos.matches) - sum(neg.matches)
                       return(score)
                     }, pos, neg, .progress=.progress )
      sentiment.df = data.frame(text=tweettext, score=scores)
      return(sentiment.df)
    }

Sorry, if this question is stupid, but I need this function to get data for my research.

Edit: I use Windows 10 my RStudio version is 1.4.1103

Here is a folder with data

tweettext: (Trudeau_tweettext.csv)
pos: (positive-words.txt)
neg: (negative-words.txt)
    library(stringr)
    library(plyr)
    library(dplyr)
    library(tm)

I wish you all a lovely day (or night)!

Klaudia
  • 1
  • 2
  • It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick May 17 '22 at 20:08
  • Thank you for this tip! I edited the question as much as I am able to and I provided my data. – Klaudia May 17 '22 at 20:50

0 Answers0