0

I read this article earlier and noticed that the pandas apply function, iterrows and for loop are terribly slow and efficient way of working with pandas dataframes.

I am doing sentiment analysis on some text data, but using apply causes high memory usage and low speeds similar to shown in this answer.

%%time
data.merge(data.essay.apply(lambda s: pd.Series({'neg':sid.polarity_scores(s)['neg'],
                                                 'neu':sid.polarity_scores(s)['neu'],
                                                 'pos':sid.polarity_scores(s)['pos'],
                                                 'compound':sid.polarity_scores(s)['compound']})),
                       left_index=True, right_index=True)

How can I implement this using either built-in numpy or pandas function? Edit:- The column contains essay text data

s.ouchene
  • 1,682
  • 13
  • 31
dracarys3
  • 107
  • 2
  • 12

1 Answers1

0

I found one way to perform this function faster by using pandarallel.

By using the default pandas apply function it took 9 min 24 secs,

But by using pandarallel it completed the operation in just 1 min 7 secs (Using 16 workers).

dracarys3
  • 107
  • 2
  • 12