I have a rather large data set and I am trying to calculate the sentiment across each document. I am using Vader to calculate the sentiment with the following code, but this process takes over 6 hours to run. I am looking for any way to speed up this process.
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
%time full_trans['bsent'] = full_trans['body_text'].apply(lambda row: analyzer.polarity_scores(row))
Any thoughts would be great because looping through the rows like this is terribly inefficient.
As an example, I have run my code on a mini sample of 100 observations. The results from the alternative forms of code are below. My original code is first, the suggested change to a list comprehension is second. It seems strange that there is no increase in performance between the two methods.
transtest = full_transx.copy(deep=True)
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
%time transtest['bsent'] = [analyzer.polarity_scores(row) for row in transtest['body_text']]
%time full_transx['bsent'] = full_transx['body_text'].apply(lambda row: analyzer.polarity_scores(row))
Wall time: 4min 11s
Wall time: 3min 59s