0

I have a dataframe as such, and i'm trying to generate the RESULT column, using a groupby on the Set, Subset and Subsubset columns. I tried returning idmax on perc.

| Set | Subset | Subsubset | Class | perc | RESULT |
|-----|--------|-----------|-------|------|--------|
|   1 | A      |         1 | good  |  100 | good   |
|   1 | A      |           | ok    |    0 | good   |
|   1 | A      |           | poor  |    0 | good   |
|   1 | A      |           | bad   |    0 | good   |
|   1 | A      |         2 | good  |   20 | bad    |
|   1 | A      |           | ok    |   10 | bad    |
|   1 | A      |           | poor  |   20 | bad    |
|   1 | A      |           | bad   |   50 | bad    |
|   1 | A      |         3 | good  |    0 | poor   |
|   1 | A      |           | ok    |   10 | poor   |
|   1 | A      |           | poor  |   80 | poor   |
|   1 | A      |           | bad   |   10 | poor   |
|   1 | B      |         1 | good  |   50 | good   |
|   1 | B      |           | ok    |    0 | good   |
|   1 | B      |           | poor  |    1 | good   |
|   1 | B      |           | bad   |   49 | good   |
|   1 | B      |         2 | good  |   60 | good   |
|   1 | B      |           | ok    |   10 | good   |
|   1 | B      |           | poor  |   20 | good   |
|   1 | B      |           | bad   |   10 | good   |

To clarify, the result will always be a single value (never will see a 50/50 split for example).

Sets number in the hundreds, subsets upto ZZ (very long table).

This is different to a similar question Python : Getting the Row which has the max value in groups using groupby as here i am interested in looking at grouping on MULTIPLE columns.

BAC83
  • 811
  • 1
  • 12
  • 27
  • Possible duplicate of [Python : Getting the Row which has the max value in groups using groupby](https://stackoverflow.com/questions/15705630/python-getting-the-row-which-has-the-max-value-in-groups-using-groupby) – jose_bacoy May 01 '19 at 14:28

1 Answers1

2

Since you mentioned idxmax , then we using idxmax

idx=df.groupby(['Set','Subset','Subsubset'])['perc'].transform('idxmax')

df['RESULT']=df.loc[idx,'Class'].values#df.Class.reindex(idx).values
BENY
  • 317,841
  • 20
  • 164
  • 234