Set column name for size()

Question

I'm trying to rename the size() column as shown here like this:

x = monthly.copy()

x["size"] = x\
      .groupby(["sub_acct_id", "clndr_yr_month"]).transform(np.size)

But what I'm getting is

ValueError: Wrong number of items passed 15, placement implies 1

Why is this not working for my dataframe?

If I simple print the copy:

x = monthly.copy()
print x

this is how the table looks like:

sub_acct_id  clndr_yr_month
12716D       201601             219
             201602             265
12716G       201601             221
             201602             262
12716K       201601             181
             201602             149
...

what I try to accomplish is to set the name of the column:

sub_acct_id  clndr_yr_month     size
12716D       201601             219
             201602             265
12716G       201601             221
             201602             262
12716K       201601             181
             201602             149
...

What about `x["size"] = x.groupby(["sub_acct_id", "clndr_yr_month"]).transform(len)` ? — jezrael, Jul 06 '16 at 12:48

jezrael · Accepted Answer · 2016-07-06T12:59:18.447

You need:

x["size"] = x.groupby(["sub_acct_id", "clndr_yr_month"])['sub_acct_id'].transform('size')

Sample:

df = pd.DataFrame({'sub_acct_id': ['x', 'x', 'x','x','y','y','y','z','z']
                , 'clndr_yr_month': ['a', 'b', 'c','c','a','b','c','a','b']})
print (df)
  clndr_yr_month sub_acct_id
0              a           x
1              b           x
2              c           x
3              c           x
4              a           y
5              b           y
6              c           y
7              a           z
8              b           z

df['size'] = df.groupby(['sub_acct_id', 'clndr_yr_month'])['sub_acct_id'].transform('size')
print (df)
  clndr_yr_month sub_acct_id  size
0              a           x     1
1              b           x     1
2              c           x     2
3              c           x     2
4              a           y     1
5              b           y     1
6              c           y     1
7              a           z     1
8              b           z     1

Another solution with aggregating output:

df = df.groupby(['sub_acct_id', 'clndr_yr_month']).size().reset_index(name='Size')
print (df)
  sub_acct_id clndr_yr_month  Size
0           x              a     1
1           x              b     1
2           x              c     2
3           y              a     1
4           y              b     1
5           y              c     1
6           z              a     1
7           z              b     1

Set column name for size()

1 Answers1