How do I create the counts of the column values, grouped by values in the other column in Pandas?

Question

I have a dataframe df that has values:

ID    Status
1       A
2       B
5       A
1       A
3       B
4       B
5       B

I need to group column ID by the column Status. The issue is that ID can have duplicates, that can have the same or different codes.

The code I have is:

df_new = df.groupby('ID').Status.nunique()

However, I am getting IDs grouped, without showing the Status column and their values. What I need to create is a dataset that looks like this:

Status  Count
  A      3
  B      4

score 3 · Answer 1 · answered Jul 27 '17 at 21:40

3

You need to groupby and count:

df.groupby('Status')['Status'].count()

Output:

Status
A    3
B    4
Name: Status, dtype: int64

answered Jul 27 '17 at 21:40

Scott Boston

147,308
15
139
187

Why not `df.Status.value_counts()`? – Zero Jul 28 '17 at 05:11
I thought of that too on my drive home. I was going to edit this answer. Thanks, John for suggestion. – Scott Boston Jul 28 '17 at 06:06

score 1 · Answer 2 · answered Jul 27 '17 at 21:36

I don't know Pandas, but I know SQL, and underlying concept of what you're doing is the same. You need to aggregate your data with a count function, first. Then you can group by that status column.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html

Also, see this similar SO answer:

https://stackoverflow.com/a/22391554/5129424

Pasted gist of answer here in case the link goes away:

df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()

score 0 · Answer 3 · edited Sep 27 '17 at 16:34

I think you need value_counts, rename_axis and reset_index for DataFrame:

df = df['Status'].value_counts().rename_axis('Status').reset_index(name='Count')
print (df)
  Status  Count
0      B      4
1      A      3

Or aggregate by GroupBy.size:

df = df.groupby('Status').size().reset_index(name='Count')
print (df)
  Status  Count
0      A      3
1      B      4

EDIT:

But if want get size by column ID, another column is not necessary:

df1 = df.groupby('ID')['Status'].size().reset_index(name='Count')
print (df1)
   ID  Count
0   1      2
1   2      1
2   3      1
3   4      1
4   5      2

df2 = df.groupby('ID')['ID'].size().reset_index(name='Count')
print (df2)
   ID  Count
0   1      2
1   2      1
2   3      1
3   4      1
4   5      2

df3 = df.groupby('ID').size().reset_index(name='Count')
print (df3)
   ID  Count
0   1      2
1   2      1
2   3      1
3   4      1
4   5      2

But is possible use:

df4 = df.groupby('ID')['Status'].value_counts().reset_index(name='Count')
print (df4)
   ID Status  Count
0   1      A      2
1   2      B      1
2   3      B      1
3   4      B      1
4   5      A      1
5   5      B      1

What is same as:

df4 = df.groupby(['ID', 'Status']).size().reset_index(name='Count')
print (df4)
   ID Status  Count
0   1      A      2
1   2      B      1
2   3      B      1
3   4      B      1
4   5      A      1
5   5      B      1

What is the difference between size and count in pandas?

score 0 · Answer 4 · answered Jul 31 '17 at 00:37

0

For the output that you wish to create, a value_counts method on the variable Status would be sufficient.

import pandas as pd
df = pd.DataFrame(['A','B','A','A','B','B','B'])
df.columns=['Status']
df.Status.value_counts()

answered Jul 31 '17 at 00:37

Asela Dassanayake

53
1
4

How do I create the counts of the column values, grouped by values in the other column in Pandas?

4 Answers4