groupby: how to show max(field1) and the value of field2 corresponding to max(field1)?

Question

Let's say I have a table with 3 fields: client, city, sales, with sales being a float.

+--------+--------+-------+
| client |  city  | sales |
+--------+--------+-------+
| a      | NY     |     0 |
| a      | LA     |     1 |
| a      | London |     2 |
| b      | NY     |     3 |
| b      | LA     |     4 |
| b      | London |     5 |
+--------+--------+-------+

For each client, I would like to show what is the city with the greatest sales, and what those sales are, ie I want this output:

+--------+--------+-------+
| client |  city  | sales |
+--------+--------+-------+
| a      | London |     2 |
| b      | London |     5 |
+--------+--------+-------+

Any suggestions?

This table can be generated with:

df=pd.DataFrame()
df['client']= np.repeat( ['a','b'],3 )
df['city'] = np.tile( ['NY','LA','London'],2)
df['sales']= np.arange(0,6)

This is wrong because it calculates the 'maximum' of the city, and shows NY because it considers that N > L

max_by_id = df.groupby('client').max()

I can first create a dataframe with the highest sales, and then merge it with the initial dataframe to retrieve the city; it works, but I was wondering if there is a faster / more elegant way?

out = pd.merge( df, max_by_id, how='inner' ,on=['client','sales'] )

I remember doing something similar with cross apply statements in SQL but wouldn't know how to run a Pandas equivalent.

Could you please elaborate on the logic behind this? Does it work also if I have other fields I want to retrieve? — Pythonista anonymous, Sep 16 '19 at 16:39
I get an AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 622, in wrapper raise ValueError on my real data which I cannot replicate in the toy example. Mmm... — Pythonista anonymous, Sep 16 '19 at 16:44
Possible duplicate of [Get the Row(s) which have the max value in groups using groupby](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby) — Quang Hoang, Sep 16 '19 at 16:44

harvpan · Answer 1 · 2019-09-16T16:42:02.413

1

You need to sort by sales and then groupby client and pick first

df.sort_values(['sales'], ascending=False).groupby('client').first().reset_index()

OR

As @user3483203:

df.loc[df.groupby('client')['sales'].idxmax()]

Output:

 client city    sales
0   a   London  2
1   b   London  5

edited Sep 16 '19 at 16:42

answered Sep 16 '19 at 16:33

harvpan

8,571
2
18
36

1

sorting is O(n log n); idxmax is O(n). The solution in the comment is more efficient – Marat Sep 16 '19 at 16:37
Right! added to the answer. @Marat – harvpan Sep 16 '19 at 16:42
`.head(1)` (or .drop_duplicates instead of groupby) in the first case. first might not return values from the same row with missing data. – ALollz Sep 16 '19 at 16:46

groupby: how to show max(field1) and the value of field2 corresponding to max(field1)?

1 Answers1