Select rows with highest value from groupby

Question

I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:

name         type     count
robert       x        123
robert       y        456
robert       z        5123
charlie      x        442123
charlie      y        0 
charlie      z        42

I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:

name         type    count
robert       z       5123
charlie      x       442123

I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important

df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()

Any help is greatly appreciated!

Did you run your code `df.sort_values('count', ascending=False).drop_duplicates('name').sort_index() ` this output what your expected — BENY, Dec 18 '18 at 22:55

ayorgo · Answer 1 · 2018-12-18T23:21:46.513

11

Try this

df.loc[df.groupby('name')['count'].idxmax()]['type']

      name type   count
3  charlie    x  442123
2   robert    z    5123

In case you want not just a single max value but the top n values per group you can do (e.g. n = 2)

df.loc[df.groupby('name')['count'].nlargest(2).index.get_level_values(1)]

      name type   count
3  charlie    x  442123
5  charlie    z      42
2   robert    z    5123
1   robert    y     456

edited Dec 18 '18 at 23:21

answered Dec 18 '18 at 23:03

ayorgo

2,803
2
25
35

This seems like the more generalizable answer. – emigre459 Jul 29 '20 at 02:14

Steven Zindel · Answer 2 · 2020-02-27T12:02:13.150

5

Just sort on name and count, group by name and keep first.

df.sort_values(['name', 'count'],ascending=False).groupby(['name']).first().reset_index()

will give you:

    name type   count
3  charlie    x  442123
2   robert    z    5123

edited Feb 27 '20 at 12:02

answered Feb 27 '20 at 11:23

Steven Zindel

91
1
4

score 4 · Accepted Answer · answered Dec 18 '18 at 23:06

What if you have two maxes for a name with different types:

print(df)

      name type   count
0   robert    x     123
1   robert    y     456
2   robert    z    5123
3   robert    a    5123
4  charlie    x  442123
5  charlie    y       0
6  charlie    z      42

Use boolean indexing:

df[df['count'] == df.groupby('name')['count'].transform('max')]

Output:

      name type   count
2   robert    z    5123
3   robert    a    5123
4  charlie    x  442123

score 0 · Answer 4 · answered Nov 15 '21 at 09:12

0

(Very slightly) better yet, you can do:

df.loc[df.groupby('name')['count'].idxmax(),'type']

answered Nov 15 '21 at 09:12

zabop

6,750
3
39
84

Select rows with highest value from groupby

4 Answers4

Linked

Related