How to GroupBy a Dataframe in Pandas and keep Columns

Question

given a dataframe that logs uses of some books like this:

Name   Type   ID
Book1  ebook  1
Book2  paper  2
Book3  paper  3
Book1  ebook  1
Book2  paper  2

I need to get the count of all the books, keeping the other columns and get this:

Name   Type   ID    Count
Book1  ebook  1     2
Book2  paper  2     2
Book3  paper  3     1

How can this be done?

Thanks!

EdChum · Accepted Answer · 2015-07-22T18:14:50.343

130

You want the following:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()

Out[20]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()

Out[25]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

edited Jul 22 '15 at 18:14

answered Jul 22 '15 at 17:17

EdChum

376,765
198
813
562

3

This seems to work, but If we had many more columns (as I have in other dataframes), wouldn't this hurt performance? Also, it is not very intuitive. – Adrian Ribao Jul 22 '15 at 18:00
2

The problem here is that grouping will reduce the amount of information so it won't necessarily yield your desired df in one go, I've updated my answer to show how it could be done in 2 steps which is better to understand – EdChum Jul 22 '15 at 18:15

score 121 · Answer 2 · edited Jul 31 '22 at 19:32

121

I think as_index=False should do the trick.

df.groupby(['Name','Type','ID'], as_index=False).count()

edited Jul 31 '22 at 19:32

jtlz2

7,700
9
64
114

answered Jun 02 '16 at 22:06

jpobst

3,491
2
25
24

3

This is the simplest answer and works for other summary stats. – Michael McFarlane Apr 20 '21 at 22:37

score 32 · Answer 3 · answered Mar 31 '20 at 10:17

32

If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...), see here. The .agg() function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}. Instead of 'first', you can also apply 'sum', 'mean' and others.

answered Mar 31 '20 at 10:17

NeStack

1,739
1
20
40

I use this because it gives custom names to new calculated columns. – Steve Scott Aug 23 '22 at 16:56
@SteveScott I actually didn't know about the option to give custom names to new columns. Can you provide an example? I will be certainly using it, I frequently come back to this answer to look up the exact syntax – NeStack Aug 24 '22 at 17:33
4

@NeStack `.agg(col1_sum=('col1', 'sum'), col2_avg=('col2', 'mean'))` – Umer Aug 31 '22 at 13:39

score 1 · Answer 4 · edited May 24 '23 at 13:28

1

SIMPLEST WAY

df.groupby(['col1', 'col1'], as_index=False).count(). Use as_index=False to retain column names. The default is True.

Also you can use df.groupby(['col_1', 'col_2']).count().reset_index()

edited May 24 '23 at 13:28

NeStack

1,739
1
20
40

answered Feb 11 '23 at 20:01

Somyadeep Shrivastava

391
2
6

Best answer! Thanks a lot – v010dya Aug 13 '23 at 11:23

rhug123 · Answer 5 · 2023-02-11T20:03:40.157

0

You can use value_counts() as well:

df.value_counts().reset_index(name= 'Count')

Output:

    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

edited Feb 11 '23 at 20:03

answered Jan 27 '23 at 14:08

rhug123

7,893
1
9
24

How to GroupBy a Dataframe in Pandas and keep Columns

5 Answers5

Linked

Related