18

When trying to count rows with similar 'kind' in data frame:

import pandas as pd

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
         ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
         ('bb', 'bb text 4'),
         ('cccc','cccc text 1'), ('cccc','cccc text 2'),
         ('dd', 'dd text 1'),
         ('e', 'e text 1'),
         ('fff', 'fff text 1'),
        ]

df = pd.DataFrame(items, columns=['kind', 'msg'])
df

    kind    msg
0   aaa     aaa text 1
1   aaa     aaa text 2
2   aaa     aaa text 3
3   bb      bb text 1
4   bb      bb text 2
5   bb      bb text 3
6   bb      bb text 4
7   cccc    cccc text 1
8   cccc    cccc text 2
9   dd      dd text 1
10  e       e text 1
11  fff     fff text 1

This code works:

df = df[['kind']].groupby(['kind'])['kind'] \
                         .count() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(5)

df

Resulting in:

    kind      count
    0   aaa   1
    1   bb    1
    2   cccc  1
    3   dd    1
    4   e     1

Yet, how can one get a data frame with all columns as in original one plus 'count' column? So the result should have columns 'kind', 'msg', 'count' in this order?

Also, how to sort this resulting data frame in descending order of count?

dokondr
  • 3,389
  • 12
  • 38
  • 62

3 Answers3

18

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count')

In [248]: df
Out[248]:
    kind          msg  count
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

sorting:

In [249]: df.sort_values('count', ascending=False)
Out[249]:
    kind          msg  count
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 3
    The transform('count') function returned multiple colums, so assigning it did not work. I had to specify which column I wanted to return. Is there something I am missing @MaxU? – leonard Nov 15 '18 at 18:52
  • @leonard, I don’t know what do you want to achieve and how your dataset looks like. I’d recommend you to ask a new question and specify there a small sample dataset and your desired dataset – MaxU - stand with Ukraine Nov 15 '18 at 20:13
  • @leonard your query is exactly answered in second answer – shantanu pathak Jan 04 '19 at 11:36
14

Here is the simple code to count the frequencies and add a column to the data frame when grouping by the kind column.

df['count'] = df.groupby('kind')['kind'].transform('count')
Community
  • 1
  • 1
0

This can also be achieved as part of a chain by

df.assign(
  count=lambda x: x.groupby('kind')['kind'].transform('count')
)

This is useful if you already have a chained expression, or you need to pass the dataframe with the extra column to a function but don't want to overwrite the dataframe.

Olsgaard
  • 1,006
  • 9
  • 19