-1

I was fiddling around with a few different algos to do text sentiment analysis. So far, all were squirley, except for one. This one looks like it's pretty accurate.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
df['sentiment'] = df['review_text'].apply(lambda x: sid.polarity_scores(x))

That gives me dictionary results, like this:

{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178}
{'neg': 0.215, 'neu': 0.556, 'pos': 0.229, 'compound': 0.0516}
{'neg': 0.373, 'neu': 0.133, 'pos': 0.493, 'compound': 0.2263}
{'neg': 0.242, 'neu': 0.547, 'pos': 0.211, 'compound': -0.1027}
{'neg': 0.31, 'neu': 0.69, 'pos': 0.0, 'compound': -0.6597}

I'm trying to figure out how to evaluate the last number in each row (-0.7178, 0.0516, 0.2263, -0.1027, -0.6597) and apply the following logic:

If compound <= 0 Then negative
ElseIf compound > .2 Then positive
Else neutral

I tried to find a substring within the dictionary, like this:

sub = '''compound':'''
df['Indexes'] = df['sentiment'].str.find(sub)  
df 

I was thinking of finding the position, and then get the last number, and then run the logic I described above. I starting to think that's not the right approach. What's the best way to solve this problem?

ASH
  • 20,759
  • 19
  • 87
  • 200
  • do you get string or dictionary `{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178}` ? Maybe you need `apply(lambda x: if x['compound'] < = 0 : ...)` – furas Feb 10 '20 at 23:37
  • Sorry, it's a dictionary. I'll update my question. – ASH Feb 10 '20 at 23:38
  • If you have dictionary then maybe you need `df['sentiment'].apply(lambda x: if x['compound'] < = 0 : ...)` – furas Feb 10 '20 at 23:40
  • You get a Series of dictionaries? In that case you can use `.map()` or `.apply()`, right? _I tried to find a substring within the dictionary, like this:_ https://realpython.com/python-dicts/ – AMC Feb 10 '20 at 23:40
  • Do you mean it should be like this? df['sentiment_label'] = df['sentiment'].apply(lambda x: if x['compound'] <= 0 : 'negative') That's giving me: SyntaxError: invalid syntax – ASH Feb 10 '20 at 23:44
  • it is not complet - it needs also `else` which may need another `if/else` – furas Feb 10 '20 at 23:45
  • Is this not a duplicate of https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column or https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns-apply-a-function-o?rq=1 ? – AMC Feb 11 '20 at 01:37

3 Answers3

1
# data = df['sentiment'] I just abstracted it to data so it looks better.

data = [
{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178},
{'neg': 0.215, 'neu': 0.556, 'pos': 0.229, 'compound': 0.0516},
{'neg': 0.373, 'neu': 0.133, 'pos': 0.493, 'compound': 0.2263},
{'neg': 0.242, 'neu': 0.547, 'pos': 0.211, 'compound': -0.1027},
{'neg': 0.31, 'neu': 0.69, 'pos': 0.0, 'compound': -0.6597}
]

def evaluate(num):
  if(num < 0):
    return 'negative'
  elif (num > 0.2):
    return 'positive'
  else:
    return "neutral"


for item in data:
  num = item['compound'];
  print(num, ' is', evaluate(num));

output:

-0.7178  is negative
0.0516  is neutral
0.2263  is positive
-0.1027  is negative
-0.6597  is negative
aviya.developer
  • 3,343
  • 2
  • 15
  • 41
  • Your example works, aviya, but how can I apply this over a field in a dataframe? That's where I am stuck. It seems like you have a list. I have a dictionary within a field in a dataframe; df['sentiment']. I can replace this field, or add a new field next to this one. It doesn't matter. I want to do whatever is easier. Thanks. – ASH Feb 10 '20 at 23:55
1

You can use apply() which gets x['compound'] and convert to "negative", "positive" or "neutral"

def convert(x):
    if x <= 0:
        return "negative"
    elif x > .2:
        return "positive"
    else:
        return "neutral"

df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))

Minimal working code

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd

def convert(x):
    if x <= 0:
        return "negative"
    elif x > .2:
        return "positive"
    else:
        return "neutral"

sid = SentimentIntensityAnalyzer()

df = pd.DataFrame({
    'review_text': ['bad', 'ok', 'fun', 'neutral']
})

df['sentiment'] = df['review_text'].apply(lambda x: sid.polarity_scores(x))

df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))

print(df[['review_text', 'result']])

Result

  review_text    result
0         bad  negative
1          ok  positive
2         fun  positive
3     neutral  negative
furas
  • 134,197
  • 12
  • 106
  • 148
0

It's looks like dict(), not str. If its dict you can take your compound use this:

df['sentiment']['compound']

If its str you can split your str, and take last part. Example:

df['Indexes'] = float(df['sentiment'].str.split('compound: ')[1])