0

I wanted to perform a groupby on an account ID and then perform a count of values after group by and give their counts as a new column. How can I do it in pandas.

Eg:

Account Id     Values
1                     Open
2                     Closed
1                     Open
3                     Closed
2                     Open

Output must be:

Account Id    Open    Closed
1                      2             0
2                      1             1
3                      0             1
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
George
  • 213
  • 1
  • 3
  • 10

2 Answers2

0

Use a groupby and value_counts to get the initial counts you want. Then unstack the multiindex to get a DataFrame and set null values to 0 to get the final results:

import pandas as pd

# Defining DataFrame
df = pd.DataFrame(index=range(5))
df['Account Id'] = [1, 2, 1, 3, 2]
df['Values'] = ['Open', 'Closed', 'Open', 'Closed', 'Open']

grouped = df.groupby('Account Id')['Values'].value_counts()
# Remove the multiindex present
grouped = grouped.unstack()
# Set null values to 0
result = grouped.where(pd.notnull(grouped), 0)

Output of result:

        Closed  Open
Account Id      
1           0   2
2           1   1
3           1   0

(Sorry, I'm not sure how to properly represent the DataFrame)

bastewart
  • 96
  • 4
  • I did something similar but want to get it as a data frame so that I can join with other existing data frames. – George Jun 18 '15 at 20:30
  • The result should be a dataframe. The intermediate 'grouped' is a multiindex series. The unstack changes it into a dataframe by making it single index. (Sorry I can't format this properly, on mobile now.) – bastewart Jun 18 '15 at 21:45
  • .reset_index() helped! Reference : http://stackoverflow.com/questions/10373660/converting-a-pandas-groupby-object-to-dataframe – George Jun 19 '15 at 18:39
  • Could you clarify exactly what it helped with, I'm slightly confused! The output of my code should be exactly the dataframe you requested. – bastewart Jun 19 '15 at 20:13
0

This would also return the dataframe for groupby object:

grouped_df = df.groupby(["Account Id","Values"])
grouped_df.size().reset_index(name = "Count")
Surya
  • 11,002
  • 4
  • 57
  • 39