10

I use the following to compute the sentiment of 200 short sentences. I did not use a training data set:

for sentence in textblob.sentences: print(sentence.sentiment)

The analysis returns two values: polarity and subjectivity. From what I read online, the polarity score is a float within the range [-1.0, 1.0] where 0 indicates neutral, +1 a very positive attitude and -1 a very negative attitude. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

So, now my question: How are those scores computed?

I have some zeros for the polarity score of almost half of the phrases and I am wondering whether the zero indicates neutrality or rather the fact that the phrase does not feature words that have a polarity. I am wondering the same question for another sentiment analyser:NaiveBayesAnalyzer.

Thank you for your help!
Marie

MarieJ
  • 619
  • 7
  • 14

2 Answers2

12

The TextBlob NaiveBayesAnalyzer is apparently based on the Stanford NLTK. The Naive Bayes algorithm in general is explained here: A simple explanation of Naive Bayes Classification

and its application to sentiment and objectivity is described here: http://nlp.stanford.edu/courses/cs224n/2009/fp/24.pdf

Basically you're right that certain words will be labeled something like "40% positive / 60% negative" based on how they were used in some body of training data (for the Stanford NLTK, the training data was movie reviews). Then the scores of all words in your sentence get multiplied to produce the sentence score.

I haven't tested, but I expect that if the library returns exactly 0.0, then your sentence didn't contain any words that had a polarity in the NLTK training set. I suspect the researchers didn't include them because 1) they were too rare in the training data or 2) they were known to be meaningless (such as "the", "a", "and", etc.).

That goes for the Naive Bayes analyzer. Regarding the PatternAnalyzer, the TextBlob docs say it's based on the "pattern" library, but it doesn't seem to document how it works. I suspect something similar is happening though.

Community
  • 1
  • 1
Luke
  • 5,329
  • 2
  • 29
  • 34
  • 1
    Thank you @Luke! For the subjectivity score 0 should represent words that are very objective, so I am not sure when 0 is given to a word that was not in the training data set or when it is given because the word is very objective. For example in the sentence 'beyond doubt' the subjectivity score is 0 and the polarity score is 0 and I am not sure how to interpret this... I am thinking to use this analysis in a scientific paper but I need to better understand (and explain) how it is computed. – MarieJ Dec 30 '15 at 16:54
  • 1
    All decent Naive Bayes algorithms use "additive smoothing" which means, for each word, they start the count of "objective" and "subjective" sentences at 1 (or some other fixed constant) instead of 0. This prevents words from getting a score of exactly 0% objective or 0% subjective. This tutorial on Naive Bayes discusses it more: http://www.nils-haldenwang.de/computer-science/machine-learning/how-to-apply-naive-bayes-classifiers-to-document-classification-problems – Luke Dec 31 '15 at 20:45
  • 1
    Thus, I think it's safe to assume that, if the Naive Bayes algorithm outputs exactly -1.0 or 1.0 for a sentence, it is not because some word was labeled 0% positive. Rather, it means the sentence contained no training-set words, and the library authors arbitrarily chose an arbitrary output in that case. The above applies to Naive Bayes with the NLTK. However, it looks like TextBlob only uses that for polarity, and not for subjectivity: http://textblob.readthedocs.org/en/dev/_modules/textblob/en/sentiments.html#PatternAnalyzer.analyze For subjectivity, it seems to use this "pattern" library. – Luke Dec 31 '15 at 21:03
  • 1
    The "pattern" library is pretty sparse on documentation, so I'm not sure how it calculates its "subjectivity": http://www.clips.ua.ac.be/pages/pattern-en#sentiment I would assume that something similar to the above is happening when it outputs exactly 0.0, but I can't verify that in their documentation. So, I wouldn't put too much trust in the "subjectivity" score unless you can find better documentation and a paper showing an accuracy rate for the underlying algorithm. – Luke Dec 31 '15 at 21:10
  • 1
    In case it helps, this tutorial shows how the positive/negative sentiment analyzer is trained: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/ At the bottom it lists a 72.8% accuracy rate on training data, which is pretty typical for sentiment analyzers -- it has some power but it's not highly accurate. – Luke Dec 31 '15 at 21:17
  • This is a valid link to the pattern library (as the URL mentioned above does not exist anymore): https://github.com/clips/pattern – Wok Jan 11 '21 at 11:16
4

According to TextBlob creator, Steven Loria,TextBlob's sentiment analyzer delegates to pattern.en's sentiment module. Pattern.en itself uses a dictionary-based approach with a few heuristics to handle, e.g. negation. You can find the source here, which is a vendorized version of pattern.en's text module, with minor tweaks for Python 3 compatibility.

Pie-ton
  • 550
  • 4
  • 17
  • 1
    This should be indicated as the right answer. Here is a link to several issues addressing this point on the Github of the library: https://github.com/sloria/TextBlob/issues/344#issuecomment-732193942 – Arnold Vialfont Apr 18 '21 at 11:52