1

I am attempting to use lexicon based scoring method to do some sentiment analysis on texts. I directly borrowed my code from http://analyzecore.com/2014/04/28/twitter-sentiment-analysis/ after reading the stack overflow post: R sentiment analysis with phrases in dictionaries

Here's a bit summary about my data set:

> summary(data$text)
   Length     Class      Mode 
       30 character character 
> str(data$text)
 chr [1:30] "Hey everybody, are you guys free on Sunday for a game play + dinner afterwards? I'll reserve a"| __truncated__ ...

and the code i'm using:

require(plyr)  
require(stringr)
require(data.table)
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
  scores = laply(sentences, function(sentence, pos.words, neg.words) {

    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)
    # and convert to lower case:
    sentence = tolower(sentence)

    # split into words. str_split is in the stringr package
    word.list = str_split(sentence, '\\s+')
    # sometimes a list() is one level of hierarchy too much
    words = unlist(word.list)

    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos.words)
    neg.matches = match(words, neg.words)

    pos.matches = !is.na(pos.matches)
    neg.matches = !is.na(neg.matches)

    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = (sum(pos.matches) - sum(neg.matches))

    return(score)
  } , pos.words, neg.words, .progress=.progress)

  scores.df = data.frame(score = scores, text = sentences)
  return(scores.df)
}

I am using Bing Liu's opinion dictionary, and I loaded them as:

pos_BL = read.table(file = 'positive-words.txt', stringsAsFactors = F)
neg_BL = read.table(file = 'negative-words.txt', stringsAsFactors = F)

and here's the code I used to run the data and dictionary through the scoring function:

score_result = score.sentiment(sentences = data$text, 
                               pos.words = pos_BL, 
                               neg.words = neg_BL, 
                               .progress= 'text')

However, no matter what I do, I only get scores of 0 for all my 30 strings. (see below table for output summary):

> table(score_result$score)
 0 
30 

I am out of ideas on where to fix (I did spot many errors in my own code before posting this question here). Any help is much appreciated!

Community
  • 1
  • 1
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49

2 Answers2

0

An example:

list=list(a='This place is awesome', b='I failed in the exam')
lapply(list, polarity)
Chirayu Chamoli
  • 2,076
  • 1
  • 17
  • 32
  • hello chirayu, i tried to load the qdap package, but no matter how many times i try, it always says "error: there is no package called ‘qdap’" even though it has already loaded all the other necessary dependency libraries. what could be the possible issues? do you know? thank you! – alwaysaskingquestions Jun 22 '16 at 18:21
  • You can load the package manually from [here](https://cran.r-project.org/web/packages/qdap/index.html) – Chirayu Chamoli Jun 23 '16 at 08:49
  • hi chirayu. i am a beginner here so please bear with my questions... i clicked into the link you shared, but i am still not sure how to load the package reading that page :( – alwaysaskingquestions Jun 25 '16 at 01:53
  • Hey download the windows binaries there on the link. install r-release and place it in the library where all your packages are. then load it using library(qdap). but this wont install dependencies, for that you have to manually install them too or use command. – Chirayu Chamoli Jun 25 '16 at 06:55
  • i finally got it to work! it's b/c it has to be "library(qdap)" not "library('qdap')". thank you anyways! – alwaysaskingquestions Jul 12 '16 at 00:21
  • Thats what i mentioned right. library(qdap). Anyway good that it worked for you. – Chirayu Chamoli Jul 13 '16 at 04:34
0

You must take care not to be introducing a table or df instead of a vector as 'pos.words' and 'neg.words' parameters of function 'score.sentiment'. It will take a longer time and return no result in that case. Try something like this:

score_result = score.sentiment(sentences = data$text, 
                               pos.words = as.character(pos_BL[ , 1]), 
                               neg.words = as.character(neg_BL[ , 1]), 
                               .progress= 'text')

Maybe 'as.character()' function is not necesary.