1

I have data like this, it's output of a groupby:

numUsers = df.groupby(["user","isvalid"]).count()

                      count     
user       isvalid               
5          0.0         1336  
           1.0          387 

But I need to have count of count_valid and count_invalid columns for each user, like this:

                    count_valid  count_invalid
user 
5                           387           1336
           

How can I do it in optimized way in Pandas?

AVEbrahimi
  • 17,993
  • 23
  • 107
  • 210

2 Answers2

3

You can use:

out = (df.groupby(["user","isvalid"]).count()
         .rename({0: 'count_invalid', 1: 'count_valid'}, level=1)
         ['count'].unstack()
       )

Output:

isvalid  count_invalid  count_valid
user                               
5                 1336          387

Or, more generic if you have multiple columns, using a MultiIndex:

out = (df.groupby(["user","isvalid"]).count()
         .unstack().rename(columns={0: 'invalid', 1: 'valid'}, level=1)
       )
out.columns = out.columns.map('_'.join)

Output:

      count_invalid  count_valid
user                            
5              1336          387

Or from the original dataset with a crosstab:

pd.crosstab(df['user'], df['isvalid'].map({0: 'count_invalid', 1: 'count_valid'}))
mozway
  • 194,879
  • 13
  • 39
  • 75
2

You can replace groupby_count by value_counts:

>>> (df.replace({'isvalid': {0: 'count_invalid', 1: 'count_valid'}})
       .value_counts(['user', 'isvalid']).unstack('isvalid')
       .rename_axis(columns=None))

      count_invalid  count_valid
user                            
5              1336          387

Another version with pivot_table:

>>> (df.replace({'isvalid': {0: 'count_invalid', 1: 'count_valid'}}).assign(count=1)
       .pivot_table(index='user', columns='isvalid', values='count', aggfunc='count')
       .rename_axis(columns=None))

      count_invalid  count_valid
user                            
5              1336          387
Corralien
  • 109,409
  • 8
  • 28
  • 52