nltk NaiveBayesClassifier training for blogs sentiment analysis

Question

I scrapped texts from different blog posts about a specific topic. Most of topics I read about sentimenet analysis are based on training the classifier, in order to decide whether it is a pos/neg answer as shown in this thread. My questions is where can I find dictionary of words, and there sentiments. eg: Nice: Positive , bad: negative.

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

What you are looking for is a sentiment lexicon. A sentiment lexicon is a dictionary of words, in which each word has a corresponding sentiment score (ranging from very negative to very positive) or as you mentioned a tag such as good or bad (But the later is uncommon). There are several sentiment lexicons that you could use, such as sentiwordnet, sentistrength, and AFINN just to name a few. In all three of these lexicons you get sentiment scores corresponding to each sentiment word, and ofcourse, you can simply set a condition that if a word has a corresponding negative score its bad and if a positive one its good. The easiest to use among these is AFINN which I recommend you to start with. Later you can upgrade to a more suitable one based on your application. You can find information about AFINN here and download it from here.

Let me know if you had further questions.

score 0 · Answer 2 · answered Sep 10 '18 at 10:01

If you are working with text in English, you can use the dictionary of polarity scores associated with a pre-trained model. I suggest Vader from NLTK, as it is sufficiently simple to handle.

from nltk.sentiment import vader
analyzer = vader.SentimentIntensityAnalyzer()
words_with_sentiments = analyzer.make_lex_dict()
len(words_with_sentiments)

Output is 7502 entries.

The output of .make_lex_dict() is a dictionary, which has this structure:

{...
'agree': 1.5,
 'agreeability': 1.9,
 'agreeable': 1.8,
 'agreeableness': 1.8,
 'agreeablenesses': 1.3,
 'agreeably': 1.6,
 'agreed': 1.1,
 'agreeing': 1.4,
 'agreement': 2.2,
 'agreements': 1.1,
 'agrees': 0.8,
 'alarm': -1.4
...}

Positive values correspond to positive sentiments, in theory, and negative values correspond to negative sentiments. You can then use this dictionary as a lookup table for the strings you are parsing.

nltk NaiveBayesClassifier training for blogs sentiment analysis

2 Answers2

Linked