I have a problem with this piece of code. I can't find a bug, but the results are clearly incorrect. After calling the function:
data=sentimentfunction(My_tweettext, positive_war, negative_war,
.progress='text')
I get this: ss of result
The result is a df with downloaded tweets (cleaning has been done), where every second result of the sentiment function is the maximum. 3 identical tweets, we get 2x score = 0 and 1x score = 4771.
Could someone smarter than me look at this code and check it for correctness? Suggest how I can get the correct results? I want to use the "tweettext" that I have already obtained.
sentimentfun = function(tweettext, pos, neg, .progress='non')
{
# Parameters
# tweettext: vector of text to score
# pos: vector of words of postive sentiment
# neg: vector of words of negative sentiment
# .progress: passed to laply() 4 control of progress bar
scores = laply(tweettext,
function(singletweet, pos, neg)
{
singletweet = gsub("[[:punct:]]", "", singletweet)
singletweet = gsub("[[:cntrl:]]", "", singletweet)
singletweet = gsub("\\d+", "", singletweet)
tryTolower = function(x)
{
y = NA
try_error = tryCatch(tolower(x), error=function(e)e)
if (!inherits(try_error, "error"))
y = tolower(x)
return(y)
}
singletweet = sapply(singletweet, tryTolower)
word.list = str_split(singletweet, "\\s+")
words = unlist(word.list)
pos.matches = match(words, pos)
neg.matches = match(words, neg)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos, neg, .progress=.progress )
sentiment.df = data.frame(text=tweettext, score=scores)
return(sentiment.df)
}
Sorry, if this question is stupid, but I need this function to get data for my research.
Edit: I use Windows 10 my RStudio version is 1.4.1103
tweettext: (Trudeau_tweettext.csv)
pos: (positive-words.txt)
neg: (negative-words.txt)
library(stringr)
library(plyr)
library(dplyr)
library(tm)
I wish you all a lovely day (or night)!