My code is running really slowly. I have a dataset with about one hundred thousand rows that contains the name of the person who wrote the post (each name can occur many times throughout the dataset, as they could have written multiple posts), the post, and another column with just the "feeling" words extracted from the corresponding post. Feeling words are something like: ['happy', 'sad', 'delighted', 'excited', 'angry', 'disappointed', 'annoyed', 'disheartened', 'frightened', 'content', 'peaceful']...the list keeps on going, but you get the point. Here's an example I made up:
Message Name Feeling_Words
0 I am really happy with my progress. Alice [happy]
1 I am really happy with John's progress. Alice [happy]
2 I was annoyed by his inconsideration. John [annoyed]
3 I felt proud after seeing her performance. Lisa [proud]
4 I am ecstatic after hearing the good news. Alice [ecstatic]
5 I felt disappointed by her dishonesty. Lisa [disappointed]
6 I was disheartened by their actions. John [disheartened]
7 I am delighted about the good news. I Lisa [delighted, proud]
am proud to represent our entire
community for this occasion.
.........
I am using the following code to find the most common feeling words that occur for each name. However, the code is really slow to run. I am running it in Jupyter, and it has been going for about 30 minutes now, and still has not executed:
//group all feeling words said by name (using Counter)
df.groupby('Name')['Feeling_words'].sum().apply(Counter)
//find most common feeling word per name
df.groupby('Name')['Feeling_words'].sum().apply(
lambda feel: Counter(feel).most_common(1))
//find total number of feeling words per name
df.groupby('Name')['Feeling_words'].sum().apply(lambda feel: len(feel))
What specifically makes this so slow -- is the apply() or the groupby() or something else? Any suggestions to improve the run-time of this code while still maintaining the functionality would be greatly appreciated. Again I want to a) group all the feeling words said by Alice, John, and so on..., b) find the maximum occurring feeling word for each name and c) count the total number of feeling words for each name. I am fairly new to this so I am unsure of other approaches. Thanks in advance!