How can I group by the count number of column values and sort it?

Question

I am a pandas learner.

I have original dataframe called data.log. Now I want to count the numbers group by 'c-ip-1', and sort the result.

The original data.log:

   sc-status  sc-substatus  sc-win32-status  time-taken       c-ip-1
0        200             0                0         986  31.7.188.55
1        200             0                0        2539  31.7.188.55
2        200             0                0        1172  31.7.188.56
3        200             0                0        3152  31.7.188.80
4        200             0                0        1091  31.7.188.80
...
99       200             0                0        1115  31.9.200.60
100      200             0                0        2000  31.9.200.61

The expect result is as follows:

         c-ip-1                 count
0        31.7.188.56            1     
1        31.9.200.61            1  
2        31.7.188.55            2  
...
34       31.9.200.60            5

I tried to write python code and run it, but it failed:

import pandas as pd

df = pd.read_table('data.log', sep=" ")

print(df[['c-ip-1']].groupby(['c-ip-1']).agg(['count'])

How can I use python solve the problem?

score 4 · Accepted Answer · edited Sep 27 '17 at 16:40

I think you need aggregate by GroupBy.size, then Series.sort_values and last Series.reset_index:

#better is more general separator `\s+` - one or more whitespaces
df = pd.read_table('data.log', sep="\s+")

df1 = df.groupby('c-ip-1').size().sort_values().reset_index(name='count')
print (df1)
        c-ip-1  count
0  31.7.188.56      1
1  31.9.200.60      1
2  31.9.200.61      1
3  31.7.188.55      2
4  31.7.188.80      2

What is the difference between size and count in pandas?

score 0 · Answer 2 · answered Jul 09 '17 at 06:39

0

You can use pd.Series.value_counts. By default, it sorts in descending order of count size. You can pass the parameter ascending=False to reverse that. Then rename the axis and column

df['c-ip-1'].value_counts(ascending=True) \
    .rename_axis('c-ip-1').reset_index(name='count')

        c-ip-1  count
0  31.9.200.61      1
1  31.9.200.60      1
2  31.7.188.56      1
3  31.7.188.55      2
4  31.7.188.80      2

answered Jul 09 '17 at 06:39

piRSquared

285,575
57
475
624

I'd appreciate some feedback on the down vote. I'd like the opportunity to fix any problem – piRSquared Jul 09 '17 at 07:16

score 0 · Answer 3 · answered Nov 20 '21 at 05:54

0

df[['c-ip-1']].groupby(['c-ip-1']).agg(c=('type', 'count')).sort_values("c")

answered Nov 20 '21 at 05:54

user1507435

328
4
13

How can I group by the count number of column values and sort it?

3 Answers3