New to Python and trying to create a simple pandas dataframe from this for loop. The loop (1) iterates through each chapter of the book (chapters) and tokenizes by sentence, then (2) gets the polarity score for each sentence and adds each to the dictionary ('sentiments'), then (3) gets an average for all sentences in each chapter. The output is one dictionary of 4 scores for each chapter.
I need to create a dataframe with 28 rows (1 per chapter) and 4 columns (1 per score in each dictionary. What's the simplest way to accomplish this?
from nltk import tokenize
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
chapters = [ainulindale,valaquenta,ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9,ch10,ch11,ch12,ch13,ch14,ch15,ch16,ch17,
ch18,ch19,ch20,ch21,ch22,ch23,ch24,akallabeth,rings]
analyzer = SentimentIntensityAnalyzer()
for chapter in chapters:
sentence_list = tokenize.sent_tokenize(chapter)
sentiments = {'compound': 0.0, 'neg': 0.0, 'neu': 0.0, 'pos': 0.0}
for sentence in sentence_list:
vs = analyzer.polarity_scores(sentence)
sentiments['compound'] += vs['compound']
sentiments['neg'] += vs['neg']
sentiments['neu'] += vs['neu']
sentiments['pos'] += vs['pos']
sentiments['compound'] = sentiments['compound'] / len(sentence_list)
sentiments['neg'] = sentiments['neg'] / len(sentence_list)
sentiments['neu'] = sentiments['neu'] / len(sentence_list)
sentiments['pos'] = sentiments['pos'] / len(sentence_list)
print(sentiments)
The output for the print statement looks like this:
{'compound': 0.221757281553398, 'neg': 0.041514563106796104, 'neu': 0.8682621359223304, 'pos': 0.09024271844660196}
{'compound': 0.09577214285714292, 'neg': 0.06266428571428569, 'neu': 0.842964285714286, 'pos': 0.09440000000000001}
{'compound': 0.05855809523809526, 'neg': 0.06347619047619049, 'neu': 0.8621809523809518, 'pos': 0.07440000000000001}
{'compound': 0.1280093023255814, 'neg': 0.037604651162790693, 'neu': 0.8903488372093022, 'pos': 0.0720813953488372}
{'compound': -0.008434615384615398, 'neg': 0.07703076923076925, 'neu': 0.8496076923076921, 'pos': 0.07333846153846156}
{'compound': 0.20025294117647055, 'neg': 0.027411764705882358, 'neu': 0.910294117647059, 'pos': 0.06223529411764705}
{'compound': 0.24236, 'neg': 0.020013333333333327, 'neu': 0.9022666666666667, 'pos': 0.07770666666666666}
{'compound': 0.25085555555555544, 'neg': 0.056074074074074075, 'neu': 0.8129444444444446, 'pos': 0.1309814814814815}
{'compound': 0.02056170212765958, 'neg': 0.0704255319148936, 'neu': 0.8526382978723408, 'pos': 0.07694680851063829}
{'compound': -0.13621911764705882, 'neg': 0.09723529411764704, 'neu': 0.8521323529411767, 'pos': 0.05060294117647059}
{'compound': -0.07011322957198443, 'neg': 0.09842801556420237, 'neu': 0.8354124513618679, 'pos': 0.06617898832684826}
{'compound': 0.13921688311688318, 'neg': 0.04997402597402598, 'neu': 0.8669610389610388, 'pos': 0.083012987012987}
{'compound': 0.019619718309859153, 'neg': 0.08153521126760564, 'neu': 0.848169014084507, 'pos': 0.0702394366197183}
{'compound': 0.20739687499999998, 'neg': 0.04675, 'neu': 0.86025, 'pos': 0.09300000000000003}
{'compound': 0.05655333333333335, 'neg': 0.07552000000000003, 'neu': 0.8370933333333335, 'pos': 0.08737333333333329}
{'compound': 0.1834313253012048, 'neg': 0.03204819277108433, 'neu': 0.8945903614457832, 'pos': 0.07337349397590363}
{'compound': -0.058446464646464656, 'neg': 0.0901919191919192, 'neu': 0.8533737373737375, 'pos': 0.056434343434343434}
{'compound': 0.049436129032258073, 'neg': 0.06221935483870969, 'neu': 0.863077419354839, 'pos': 0.07469032258064519}
{'compound': 0.10077664233576646, 'neg': 0.053270072992700715, 'neu': 0.8727883211678833, 'pos': 0.07395620437956206}
{'compound': -0.09540880503144653, 'neg': 0.09535849056603773, 'neu': 0.8386918238993711, 'pos': 0.0659622641509434}
{'compound': -0.058940259740259765, 'neg': 0.08786363636363642, 'neu': 0.844915584415584, 'pos': 0.06720995670995672}
{'compound': -0.09371438356164379, 'neg': 0.09126712328767121, 'neu': 0.8470547945205481, 'pos': 0.06167808219178085}
{'compound': -0.10401964636542241, 'neg': 0.09612770137524558, 'neu': 0.8361139489194496, 'pos': 0.06777799607072695}
{'compound': -0.046306122448979595, 'neg': 0.07844217687074834, 'neu': 0.8614761904761906, 'pos': 0.06008163265306123}
{'compound': 0.05695540540540539, 'neg': 0.06936486486486487, 'neu': 0.8577702702702703, 'pos': 0.07287837837837836}
{'compound': -0.015284375000000006, 'neg': 0.07314843749999998, 'neu': 0.8589296875000001, 'pos': 0.06794531250000001}
{'compound': 0.05184410112359551, 'neg': 0.0851095505617977, 'neu': 0.82794382022472, 'pos': 0.08693258426966298}
{'compound': 0.023425435540069702, 'neg': 0.06889895470383278, 'neu': 0.8573484320557486, 'pos': 0.07374564459930318}