How can I evaluate the last number in a dictionary and replace the number with text?

Question

I was fiddling around with a few different algos to do text sentiment analysis. So far, all were squirley, except for one. This one looks like it's pretty accurate.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
df['sentiment'] = df['review_text'].apply(lambda x: sid.polarity_scores(x))

That gives me dictionary results, like this:

{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178}
{'neg': 0.215, 'neu': 0.556, 'pos': 0.229, 'compound': 0.0516}
{'neg': 0.373, 'neu': 0.133, 'pos': 0.493, 'compound': 0.2263}
{'neg': 0.242, 'neu': 0.547, 'pos': 0.211, 'compound': -0.1027}
{'neg': 0.31, 'neu': 0.69, 'pos': 0.0, 'compound': -0.6597}

I'm trying to figure out how to evaluate the last number in each row (-0.7178, 0.0516, 0.2263, -0.1027, -0.6597) and apply the following logic:

If compound <= 0 Then negative
ElseIf compound > .2 Then positive
Else neutral

I tried to find a substring within the dictionary, like this:

sub = '''compound':'''
df['Indexes'] = df['sentiment'].str.find(sub)  
df

I was thinking of finding the position, and then get the last number, and then run the logic I described above. I starting to think that's not the right approach. What's the best way to solve this problem?

do you get string or dictionary `{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178}` ? Maybe you need `apply(lambda x: if x['compound'] < = 0 : ...)` — furas, Feb 10 '20 at 23:37
If you have dictionary then maybe you need `df['sentiment'].apply(lambda x: if x['compound'] < = 0 : ...)` — furas, Feb 10 '20 at 23:40
You get a Series of dictionaries? In that case you can use `.map()` or `.apply()`, right? _I tried to find a substring within the dictionary, like this:_ https://realpython.com/python-dicts/ — AMC, Feb 10 '20 at 23:40
Do you mean it should be like this? df['sentiment_label'] = df['sentiment'].apply(lambda x: if x['compound'] <= 0 : 'negative') That's giving me: SyntaxError: invalid syntax — ASH, Feb 10 '20 at 23:44
it is not complet - it needs also `else` which may need another `if/else` — furas, Feb 10 '20 at 23:45
Is this not a duplicate of https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column or https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns-apply-a-function-o?rq=1 ? — AMC, Feb 11 '20 at 01:37

score 1 · Answer 1 · answered Feb 10 '20 at 23:44

# data = df['sentiment'] I just abstracted it to data so it looks better.

data = [
{'neg': 0.315, 'neu': 0.593, 'pos': 0.093, 'compound': -0.7178},
{'neg': 0.215, 'neu': 0.556, 'pos': 0.229, 'compound': 0.0516},
{'neg': 0.373, 'neu': 0.133, 'pos': 0.493, 'compound': 0.2263},
{'neg': 0.242, 'neu': 0.547, 'pos': 0.211, 'compound': -0.1027},
{'neg': 0.31, 'neu': 0.69, 'pos': 0.0, 'compound': -0.6597}
]

def evaluate(num):
  if(num < 0):
    return 'negative'
  elif (num > 0.2):
    return 'positive'
  else:
    return "neutral"


for item in data:
  num = item['compound'];
  print(num, ' is', evaluate(num));

output:

-0.7178  is negative
0.0516  is neutral
0.2263  is positive
-0.1027  is negative
-0.6597  is negative

Your example works, aviya, but how can I apply this over a field in a dataframe? That's where I am stuck. It seems like you have a list. I have a dictionary within a field in a dataframe; df['sentiment']. I can replace this field, or add a new field next to this one. It doesn't matter. I want to do whatever is easier. Thanks. — ASH, Feb 10 '20 at 23:55

score 1 · Accepted Answer · answered Feb 10 '20 at 23:52

You can use apply() which gets x['compound'] and convert to "negative", "positive" or "neutral"

def convert(x):
    if x <= 0:
        return "negative"
    elif x > .2:
        return "positive"
    else:
        return "neutral"

df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))

Minimal working code

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd

def convert(x):
    if x <= 0:
        return "negative"
    elif x > .2:
        return "positive"
    else:
        return "neutral"

sid = SentimentIntensityAnalyzer()

df = pd.DataFrame({
    'review_text': ['bad', 'ok', 'fun', 'neutral']
})

df['sentiment'] = df['review_text'].apply(lambda x: sid.polarity_scores(x))

df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))

print(df[['review_text', 'result']])

Result

  review_text    result
0         bad  negative
1          ok  positive
2         fun  positive
3     neutral  negative

score 0 · Answer 3 · answered Feb 10 '20 at 23:43

0

It's looks like dict(), not str. If its dict you can take your compound use this:

df['sentiment']['compound']

If its str you can split your str, and take last part. Example:

df['Indexes'] = float(df['sentiment'].str.split('compound: ')[1])

answered Feb 10 '20 at 23:43

sunshineguy

61
2

How can I evaluate the last number in a dictionary and replace the number with text?

3 Answers3