I use vader for Sentiment Analysis. When I add a single word in addition to the Vader lexicon, it works i.e. it detects the new added word as either positive or negative based on the value I give with the word. Code is below:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sid_obj = SentimentIntensityAnalyzer()
new_word = {'counterfeit':-2,'Good':2,}
sid_obj.lexicon.update(new_word)
sentence = "Company Caught Counterfeit."
sentiment_dict = sid_obj.polarity_scores(sentence)
tokenized_sentence = nltk.word_tokenize(sentence)
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]
for word in tokenized_sentence:
if (sid_obj.polarity_scores(word)['compound']) >= 0.1:
pos_word_list.append(word)
elif (sid_obj.polarity_scores(word)['compound']) <= -0.1:
neg_word_list.append(word)
else:
neu_word_list.append(word)
print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list)
print("Overall sentiment dictionary is : ", sentiment_dict)
print("sentence was rated as ", sentiment_dict['neg']*100, "% Negative")
print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral")
print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive")
print("Sentence Overall Rated As", end = " ")
# decide sentiment as positive, negative and neutral
if sentiment_dict['compound'] >= 0.05 :
print("Positive")
elif sentiment_dict['compound'] <= - 0.05 :
print("Negative")
else :
print("Neutral")
The output is as follows:
Positive: []
Neutral: ['Company', 'Caught', '.']
Negative: ['Counterfeit']
Overall sentiment dictionary is : {'neg': 0.6, 'neu': 0.4, 'pos': 0.0, 'compound': -0.4588}
sentence was rated as 60.0 % Negative
sentence was rated as 40.0 % Neutral
sentence was rated as 0.0 % Positive
Sentence Overall Rated As Negative
It works perfectly for one word added within the lexicon. When I try to do the same using a CSV file by adding multiple words using the code below: I do not get the word Counterfeit added into my Vader Lexicon.
new_word={}
import csv
with open('Dictionary.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
new_word[row['Word']] = int(row['Value'])
print(new_word)
sid_obj.lexicon.update(new_word)
The output for the above code is a dictionary which is updated to the lexicon. The dictionary looks like this (It has about 2000 words but I've only printed a few) It also consists of Counterfeit as a word:
{'CYBERATTACK': -2, 'CYBERATTACKS': -2, 'CYBERBULLYING': -2, 'CYBERCRIME':
-2, 'CYBERCRIMES': -2, 'CYBERCRIMINAL': -2, 'CYBERCRIMINALS': -2,
'MISCHARACTERIZATION': -2, 'MISCLASSIFICATIONS': -2, 'MISCLASSIFY': -2,
'MISCOMMUNICATION': -2, 'MISPRICE': -2, 'MISPRICING': -2, 'STRICTLY': -2}
The output is as follows:
Positive: []
Neutral: ['Company', 'Caught', 'Counterfeit', '.']
Negative: []
Overall sentiment dictionary is : {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentence was rated as 0.0 % Negative
sentence was rated as 100.0 % Neutral
sentence was rated as 0.0 % Positive
Sentence Overall Rated As Neutral
Where am I going wrong when adding multiple words to the lexicon? The CSV file consists of two columns. One with the word and the other with the value as negative or positive number. Why does it still get identified as neutral? Any help will be appreciated. Thank you.