1

In the code bellow each 'Person' has a corresponding 'Sales' value , but when I execute print(compData.max()) 'Vanessa' gets the value of '340' which in the initial 'df' belongs to 'Amy'

import numpy as np
import pandas as pd

data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}
df = pd.DataFrame(data)

compData = df.groupby('Company')

print(df)
print(compData.max())
print(df.loc[3])

Here is the output :

  Company   Person  Sales
0    GOOG      Sam    200
1    GOOG  Charlie    120
2    MSFT      Amy    340
3    MSFT  Vanessa    124
4      FB     Carl    243
5      FB    Sarah    350
          Person  Sales
Company                
FB         Sarah    350
GOOG         Sam    200
MSFT     Vanessa    340

Company       MSFT
Person     Vanessa
Sales          124

Also when I execute print(df.loc[3]) 'Vanessa' gets the right value

Marius.T
  • 25
  • 7
  • Use `compData = df.groupby('Company').max() print(compData.iloc[2])` – jezrael May 13 '20 at 08:47
  • that `groupby.max()` returns the max value per group for each column. for group `MSFT` the max of `Person` is `Vanessa` and max for `Sales` is `340`. `Vanessa` is max because of "bigger" beginning letter – luigigi May 13 '20 at 08:50
  • why would you close my question ? That doesn't answer my question . ```print(comData.iloc[2])``` bring up again 'Vanessa' with the wrong value of 340 . – Marius.T May 13 '20 at 08:55
  • @jezrael I think the author is wondering why Vanessa has the value 340 for that groupby. i dont think the problem is about the indexing – luigigi May 13 '20 at 08:55
  • 1
    @luigigi Thanks so much that makes perfect sense – Marius.T May 13 '20 at 08:56
  • 1
    @luigigi - oops, reopened. – jezrael May 13 '20 at 08:58
  • This is duplicate - https://stackoverflow.com/q/15705630 – jezrael May 13 '20 at 10:55

2 Answers2

0

As luigigi pointed out, groupby.max() returns the maximum value for each column.

To get what you want you can search for the index with the max value in 'Sales' like this:

idx = df.groupby('Company')['Sales'].transform(max) == df['Sales']
print (df[idx])

Company Person  Sales
0    GOOG    Sam    200
2    MSFT    Amy    340
5      FB  Sarah    350

stolen from here

maybe there is an easier way, but thats what I found

Pibe_chorro
  • 99
  • 1
  • 8
0

I would like to add an alternate solution to Pibe_chorro's answer which gives you directly the maximum sales and sales person per company including the original index:

compData.apply(lambda x: x[x.Sales == x.Sales.max()]).reset_index(level=[0])

which gives you:

    Company Person  Sales
5        FB  Sarah  350
0      GOOG    Sam  200
2      MSFT    Amy  340

You can sort by the index if needed using the pd.DataFrame.sort_index()

compData.apply(lambda x: x[x.Sales == x.Sales.max()]).reset_index(level=[0]).sort_index()

which results:

    Company Person  Sales
0      GOOG    Sam  200
2      MSFT    Amy  340
5        FB  Sarah  350
hof.and.or
  • 134
  • 1
  • 6