102

I am using .size() on a groupby result in order to count how many items are in each group.

I would like the result to be saved to a new column name without manually editing the column names array, how can it be done?

This is what I have tried:

grpd = df.groupby(['A','B'])
grpd['size'] = grpd.size()
grpd

and the error I got:

TypeError: 'DataFrameGroupBy' object does not support item assignment (on the second line)

mradwan
  • 83
  • 2
  • 2
  • 14
d1337
  • 2,543
  • 6
  • 24
  • 22
  • 2
    worth noting that ``size`` is a bad choice for a column, since it's a builtin method on every object under pandas, so you can only retrive it through ``getitem`` and not through ``getattr``. – Meitham Dec 28 '17 at 17:17

5 Answers5

119

The .size() built-in method of DataFrameGroupBy objects actually returns a Series object with the group sizes and not a DataFrame. If you want a DataFrame whose column is the group sizes, indexed by the groups, with a custom name, you can use the .to_frame() method and use the desired column name as its argument.

grpd = df.groupby(['A','B']).size().to_frame('size')

If you wanted the groups to be columns again you could add a .reset_index() at the end.

Sealander
  • 3,467
  • 4
  • 19
  • 19
54

You need transform size - len of df is same as before:

Notice:

Here it is necessary to add one column after groupby, else you get an error. Because GroupBy.size count NaNs too, what column is used is not important. All columns working same.

import pandas as pd

df = pd.DataFrame({'A': ['x', 'x', 'x','y','y']
                , 'B': ['a', 'c', 'c','b','b']})
print (df)
   A  B
0  x  a
1  x  c
2  x  c
3  y  b
4  y  b

df['size'] = df.groupby(['A', 'B'])['A'].transform('size')
print (df)
   A  B  size
0  x  a     1
1  x  c     2
2  x  c     2
3  y  b     2
4  y  b     2

If need set column name in aggregating df - len of df is obviously NOT same as before:

import pandas as pd

df = pd.DataFrame({'A': ['x', 'x', 'x','y','y']
                , 'B': ['a', 'c', 'c','b','b']})
print (df)
   A  B
0  x  a
1  x  c
2  x  c
3  y  b
4  y  b

df = df.groupby(['A', 'B']).size().reset_index(name='Size')
print (df)
   A  B  Size
0  x  a     1
1  x  c     2
2  y  b     2
vikas_hada
  • 89
  • 2
  • 9
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Nice one. But how do I do the same as this `df.groupby(['A', 'B']).size().reset_index(name='Size')` If I have multiple index? – Sotos Apr 26 '18 at 12:20
  • @Sotos If use last version of pandas, same way. – jezrael Apr 26 '18 at 12:34
  • so something like `...reset_index('V1', name = 'size')`? – Sotos Apr 26 '18 at 12:38
  • @Sotos Hmmm, it does not work this way. Need `.reset_index().rename(columns='index':'col', 'anothercol':'col2')` – jezrael Apr 26 '18 at 13:41
  • 2
    That is exactly what I did at the end... `(full_df .set_index('cdatetime') .groupby(['Cluster', 'source', 'action', pd.Grouper(freq = 'H', sort = True)]) .size() .reset_index(['Cluster', 'source', 'action']) .rename(columns={0: 'cnt'}) )` – Sotos Apr 26 '18 at 13:44
  • Awesome answer. Could you explain again why I will get an error just because GroupBy.size count NaNs too? A Type Error is raised even though there is no NaNs in the dataframe. Thanks. – Bowen Liu Nov 23 '22 at 00:15
43

The result of df.groupby(...) is not a DataFrame. To get a DataFrame back, you have to apply a function to each group, transform each element of a group, or filter the groups.

It seems like you want a DataFrame that contains (1) all your original data in df and (2) the count of how much data is in each group. These things have different lengths, so if they need to go into the same DataFrame, you'll need to list the size redundantly, i.e., for each row in each group.

df['size'] = df.groupby(['A','B']).transform(np.size)

(Aside: It's helpful if you can show succinct sample input and expected results.)

Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • I also found this which is almost equal (creates a new dataframe), but not sure how it compares with your solution in terms of efficiency http://stackoverflow.com/questions/10373660/converting-a-pandas-groupby-object-to-dataframe – d1337 Aug 02 '13 at 01:22
  • 2
    More over your solution works well on a toy example, but on the actual data an error is returned http://pastebin.com/aCsMxCd5 – d1337 Aug 02 '13 at 09:57
  • 8
    In pandas 20.3, @jezraels's `df['size'] = df.groupby(['A','B']) .A .transform(np.size)` works; without the `.A` you get "ValueError: Wrong number of items passed 2, placement implies 1", i.e. "got 2 columns, need 1" . – denis Jul 14 '17 at 16:35
8

You can set the as_index parameter in groupby to False to get a DataFrame instead of a Series:

df = pd.DataFrame({'A': ['a', 'a', 'b', 'b'], 'B': [1, 2, 2, 2]})

df.groupby(['A', 'B'], as_index=False).size()

Output:

   A  B  size
0  a  1     1
1  a  2     1
2  b  2     2
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
0

lets say n is the name of dataframe and cst is the no of items being repeted. Below code gives the count in next column

cstn=Counter(n.cst)
cstlist = pd.DataFrame.from_dict(cstn, orient='index').reset_index()
cstlist.columns=['name','cnt']
n['cnt']=n['cst'].map(cstlist.loc[:, ['name','cnt']].set_index('name').iloc[:,0].to_dict())

Hope this will work