Sorry for the long-winded question. I will make sure to clarify. What I have is a dataframe that looks a little like this: In [1]: df Out[1]:
likes num_comments words
0 324 1032 song
1 123 4399 bird
2 500 4993 pie
3 453 2003 bird
4 34 132 lucky
... ...
943 129 259 bird
944 594 5993 lucky
945 234 642 bird
946 95 813 song
947 461 6122 pie
I want a dataframe that reduces this large dataframe into a smaller one, by combing all words that are the same. I also want to sum together their likes and number of comments and then I want to add a column that adds how many times they appeared in the original dataframe. That is, I want something like this:
In [2]: df2 Out[2]:
likes num_comments words frequency
0 419 2035 song 2
1 939 7303 bird 4
2 961 11115 pie 2
4 628 6155 lucky 2
etc...
But I am not even sure if it is possible. Note that I want to combine a ton of unique words in a very large dataframe. I tried to just combine the words with:
df1 = df.grouby('word')
But when I printed, I got this output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001146E8883D0>