I have a dataframe, something like:
index name message_counter
1 AA Counter({'hello':1})
2 BB Counter({'how':1, 'are':1, 'you':1})
3 BB Counter({'how':1})
4 AA Counter({'hello':1})
5 CC Counter({'hello':1})
I want a sum of all the counters from each unique name. So I did:
df.groupby('name')['message_counter'].sum()
and got the right answer. something like:
name
AA {'hello':2}
BB {'how':2, 'are':1, 'you':1}
CC {'hello':1}
But it was surprisingly slow on my data set. It's going through 6 unique names and summing through 33,000 Counters (numbers of rows in my data frame), Which is not that much, but it took me way longer than I expected. Something like 50+ seconds, and the whole 180 lines doesn't take that much time.
What am I doing wrong? How can I improve this?