0

Let's say I have a table with 3 fields: client, city, sales, with sales being a float.

+--------+--------+-------+
| client |  city  | sales |
+--------+--------+-------+
| a      | NY     |     0 |
| a      | LA     |     1 |
| a      | London |     2 |
| b      | NY     |     3 |
| b      | LA     |     4 |
| b      | London |     5 |
+--------+--------+-------+

For each client, I would like to show what is the city with the greatest sales, and what those sales are, ie I want this output:

+--------+--------+-------+
| client |  city  | sales |
+--------+--------+-------+
| a      | London |     2 |
| b      | London |     5 |
+--------+--------+-------+

Any suggestions?

This table can be generated with:

df=pd.DataFrame()
df['client']= np.repeat( ['a','b'],3 )
df['city'] = np.tile( ['NY','LA','London'],2)
df['sales']= np.arange(0,6)

This is wrong because it calculates the 'maximum' of the city, and shows NY because it considers that N > L

max_by_id = df.groupby('client').max()

I can first create a dataframe with the highest sales, and then merge it with the initial dataframe to retrieve the city; it works, but I was wondering if there is a faster / more elegant way?

out = pd.merge( df, max_by_id, how='inner' ,on=['client','sales'] )

I remember doing something similar with cross apply statements in SQL but wouldn't know how to run a Pandas equivalent.

halfer
  • 19,824
  • 17
  • 99
  • 186
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • 2
    `df.loc[df.groupby('client')['sales'].idxmax()]` – user3483203 Sep 16 '19 at 16:30
  • Could you please elaborate on the logic behind this? Does it work also if I have other fields I want to retrieve? – Pythonista anonymous Sep 16 '19 at 16:39
  • I get an AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 622, in wrapper raise ValueError on my real data which I cannot replicate in the toy example. Mmm... – Pythonista anonymous Sep 16 '19 at 16:44
  • 2
    Possible duplicate of [Get the Row(s) which have the max value in groups using groupby](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby) – Quang Hoang Sep 16 '19 at 16:44
  • @Pythonistaanonymous checkout my answer. Does that work? – harvpan Sep 16 '19 at 16:45

1 Answers1

1

You need to sort by sales and then groupby client and pick first

df.sort_values(['sales'], ascending=False).groupby('client').first().reset_index()

OR

As @user3483203:

df.loc[df.groupby('client')['sales'].idxmax()]

Output:

 client city    sales
0   a   London  2
1   b   London  5
harvpan
  • 8,571
  • 2
  • 18
  • 36