How to tokenize every sentence into indivdual words in row of dataframe and average the polarity for every word in sentence?

Question

I have a df that looks like this:

       text
0   Thanks, I’ll have a read!
1   Am I too late

How do I apply TextBlob tokenization to every word in sentence and average the polarity scores of every word in each sentence?

for example, I can do this with a single sentence in a variable:

from textblob import TextBlob
import import statistics as s

#tokenize word in sentence
a = TextBlob("""Thanks, I'll have a read!""")
print a.words

    WordList(['Thanks', 'I', "'ll", 'have', 'a', 'read'])

#get polarity of every word
    for i in a.words:
        print( a.sentiment.polarity)

    0.25
    0.25
    0.25
    0.25
    0.25
    0.25


#calculating the mean of the scores
c=[]
for i in a.words: 
    c.append(a.sentiment.polarity)
    d = s.mean(c)
    print (d)

0.25

How do I apply the a.words to every row of dataframe column for sentence?

New df:

      text                        score
0   Thanks, I’ll have a read!      0.25
1   Am I too late                  0.24

closet I come is that I can get polarity of every sentence using this function on the dataframe:

def sentiment_calc(text):
    try:
        return TextBlob(text).sentiment.polarity
    except:
        return None

df_sentences['sentiment'] = df_sentences['text'].apply(sentiment_calc)

Thank you in advance.

@jezrael `sentiment_calc` is not working the way I want it. `0.25` is the correct answer. I need the for loop for calculating average applied to the `df` — RustyShackleford, Aug 28 '18 at 10:32
Yes, but `mean` is not what need, because always returned same value `for i in a.words: print( a.sentiment.polarity)`. And it is same for each sentence. So there has to be soemthing wrong, no idea what — jezrael, Aug 28 '18 at 10:34
Could you atleast show me how to apply the for loop to the df? I can work on why the tokenization is not working and post answer — RustyShackleford, Aug 28 '18 at 10:39
I think `df_sentences['sentiment'] = df_sentences['text'].apply(sentiment_calc)` is good way. — jezrael, Aug 28 '18 at 10:39
Have you read this https://stackoverflow.com/questions/47769818/why-is-my-nltk-function-slow-when-processing-the-dataframe ? — alvas, Aug 29 '18 at 01:38
@alvas I have not but that is exactly what I am looking for. — RustyShackleford, Aug 29 '18 at 13:18

How to tokenize every sentence into indivdual words in row of dataframe and average the polarity for every word in sentence?

0 Answers0