2

I have a dataframe df that has values:

ID    Status
1       A
2       B
5       A
1       A
3       B
4       B
5       B

I need to group column ID by the column Status. The issue is that ID can have duplicates, that can have the same or different codes.

The code I have is:

df_new = df.groupby('ID').Status.nunique()

However, I am getting IDs grouped, without showing the Status column and their values. What I need to create is a dataset that looks like this:

Status  Count
  A      3
  B      4
Bonifacio2
  • 3,405
  • 6
  • 34
  • 54
Feyzi Bagirov
  • 1,292
  • 4
  • 28
  • 46

4 Answers4

3

You need to groupby and count:

df.groupby('Status')['Status'].count()

Output:

Status
A    3
B    4
Name: Status, dtype: int64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

I don't know Pandas, but I know SQL, and underlying concept of what you're doing is the same. You need to aggregate your data with a count function, first. Then you can group by that status column.

Also, see this similar SO answer:

Pasted gist of answer here in case the link goes away:

df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()
0

I think you need value_counts, rename_axis and reset_index for DataFrame:

df = df['Status'].value_counts().rename_axis('Status').reset_index(name='Count')
print (df)
  Status  Count
0      B      4
1      A      3

Or aggregate by GroupBy.size:

df = df.groupby('Status').size().reset_index(name='Count')
print (df)
  Status  Count
0      A      3
1      B      4

EDIT:

But if want get size by column ID, another column is not necessary:

df1 = df.groupby('ID')['Status'].size().reset_index(name='Count')
print (df1)
   ID  Count
0   1      2
1   2      1
2   3      1
3   4      1
4   5      2

df2 = df.groupby('ID')['ID'].size().reset_index(name='Count')
print (df2)
   ID  Count
0   1      2
1   2      1
2   3      1
3   4      1
4   5      2

df3 = df.groupby('ID').size().reset_index(name='Count')
print (df3)
   ID  Count
0   1      2
1   2      1
2   3      1
3   4      1
4   5      2

But is possible use:

df4 = df.groupby('ID')['Status'].value_counts().reset_index(name='Count')
print (df4)
   ID Status  Count
0   1      A      2
1   2      B      1
2   3      B      1
3   4      B      1
4   5      A      1
5   5      B      1

What is same as:

df4 = df.groupby(['ID', 'Status']).size().reset_index(name='Count')
print (df4)
   ID Status  Count
0   1      A      2
1   2      B      1
2   3      B      1
3   4      B      1
4   5      A      1
5   5      B      1

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

For the output that you wish to create, a value_counts method on the variable Status would be sufficient.

import pandas as pd
df = pd.DataFrame(['A','B','A','A','B','B','B'])
df.columns=['Status']
df.Status.value_counts()