I was fiddling around with a few different algos to do text sentiment analysis. So far, all were squirley, except for one. This one looks like it's pretty accurate.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
df['sentiment'] = df['review_text'].apply(lambda x: sid.polarity_scores(x))
That gives me dictionary results, like this:
{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178}
{'neg': 0.215, 'neu': 0.556, 'pos': 0.229, 'compound': 0.0516}
{'neg': 0.373, 'neu': 0.133, 'pos': 0.493, 'compound': 0.2263}
{'neg': 0.242, 'neu': 0.547, 'pos': 0.211, 'compound': -0.1027}
{'neg': 0.31, 'neu': 0.69, 'pos': 0.0, 'compound': -0.6597}
I'm trying to figure out how to evaluate the last number in each row (-0.7178, 0.0516, 0.2263, -0.1027, -0.6597) and apply the following logic:
If compound <= 0 Then negative
ElseIf compound > .2 Then positive
Else neutral
I tried to find a substring within the dictionary, like this:
sub = '''compound':'''
df['Indexes'] = df['sentiment'].str.find(sub)
df
I was thinking of finding the position, and then get the last number, and then run the logic I described above. I starting to think that's not the right approach. What's the best way to solve this problem?