Pandas, group by count and add count to original dataframe?

Question

When trying to count rows with similar 'kind' in data frame:

import pandas as pd

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
         ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
         ('bb', 'bb text 4'),
         ('cccc','cccc text 1'), ('cccc','cccc text 2'),
         ('dd', 'dd text 1'),
         ('e', 'e text 1'),
         ('fff', 'fff text 1'),
        ]

df = pd.DataFrame(items, columns=['kind', 'msg'])
df

    kind    msg
0   aaa     aaa text 1
1   aaa     aaa text 2
2   aaa     aaa text 3
3   bb      bb text 1
4   bb      bb text 2
5   bb      bb text 3
6   bb      bb text 4
7   cccc    cccc text 1
8   cccc    cccc text 2
9   dd      dd text 1
10  e       e text 1
11  fff     fff text 1

This code works:

df = df[['kind']].groupby(['kind'])['kind'] \
                         .count() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(5)

df

Resulting in:

    kind      count
    0   aaa   1
    1   bb    1
    2   cccc  1
    3   dd    1
    4   e     1

Yet, how can one get a data frame with all columns as in original one plus 'count' column? So the result should have columns 'kind', 'msg', 'count' in this order?

Also, how to sort this resulting data frame in descending order of count?

score 18 · Accepted Answer · answered Jul 27 '17 at 09:48

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count')

In [248]: df
Out[248]:
    kind          msg  count
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

sorting:

In [249]: df.sort_values('count', ascending=False)
Out[249]:
    kind          msg  count
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

The transform('count') function returned multiple colums, so assigning it did not work. I had to specify which column I wanted to return. Is there something I am missing @MaxU? — leonard, Nov 15 '18 at 18:52
@leonard, I don’t know what do you want to achieve and how your dataset looks like. I’d recommend you to ask a new question and specify there a small sample dataset and your desired dataset — MaxU - stand with Ukraine, Nov 15 '18 at 20:13

score 14 · Answer 2 · edited Jun 30 '21 at 21:05

14

Here is the simple code to count the frequencies and add a column to the data frame when grouping by the kind column.

df['count'] = df.groupby('kind')['kind'].transform('count')

edited Jun 30 '21 at 21:05

Community

1
1

answered Sep 25 '18 at 05:47

Shubham Singh Chauhan

177
1
5

3

This should be accepted answer .. as per my knowledge – shantanu pathak Jan 04 '19 at 11:37

score 0 · Answer 3 · answered Jun 15 '22 at 11:42

This can also be achieved as part of a chain by

df.assign(
  count=lambda x: x.groupby('kind')['kind'].transform('count')
)

This is useful if you already have a chained expression, or you need to pass the dataframe with the extra column to a function but don't want to overwrite the dataframe.

Pandas, group by count and add count to original dataframe?

3 Answers3

Linked

Related