Group Pandas dataframe by two columns and output the maximum column value indication to new column

Question

I have a Pandas dataframe, which I need to group by two different columns to check which value in a column is the highest. However, if the value in the first choice exists or is already the highest, there is no need to check the second subgroup. I have looked into Get the row(s) which have the max value in groups using groupby, but I have to make some additional checks, which I have not been able to do.

Example:

df = pd.DataFrame({
    'First': ['KAT1', 'KAT1', 'KAT2', 'KAT3', 'KAT3', 'KAT4', 'KAT4', 'KAT4', 'KAT4'],
    'Second': ['E', 'M', 'M', 'E', 'E', 'E', 'M', 'M', 'E'],
    'Value': [20, 28, 25, 26, 24, 19, 23, 24, 25]
})


df
  First Second  Value
0  KAT1      E     20
1  KAT1      M     28
2  KAT2      M     25
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
6  KAT4      M     23
7  KAT4      M     24
8  KAT4      E     25

First, it would need to group by column 'First' and then by 'Second' with preference to value 'E'. Then find the maximum value in that subgroup. If 'E' does not exist, it would need to check 'M', and find the maximum value in that subgroup. Values can be tied, both would then be written to the new column as True.

Expected output:

  First Second  Value  Ismax
0  KAT1      E     20   True
1  KAT1      M     28  False
2  KAT2      M     25   True
3  KAT3      E     26   True
4  KAT3      E     24  False
5  KAT4      E     19  False
6  KAT4      M     23  False
7  KAT4      M     24  False
8  KAT4      E     25   True

I need to modify the dataframe accordingly, as seen in the Expected output. — swisscheese_95, Jan 11 '22 at 10:53

jezrael · Answer 1 · 2022-01-11T11:08:09.947

If only E and M values in Second column you can use:

#get E rows
m1 = df['Second'].eq('E')
#get groups with at least one E per First
m2 = df['First'].isin(df.loc[m1, 'First'])
#filter E groups if exist else M groups (like KAT2) and compare maximum 
df['Ismax'] = df[(m1 & m2) | (~m1 & ~m2)]
                        .groupby('First')['Value'].transform('max').eq(df['Value'])

print (df)
  First Second  Value  Ismax
0  KAT1      E     20   True
1  KAT1      M     28  False
2  KAT2      M     25   True
3  KAT3      E     26   True
4  KAT3      E     24  False
5  KAT4      E     19  False
6  KAT4      M     23  False
7  KAT4      M     24  False
8  KAT4      E     25   True

Details:

print (df[(m1 & m2)])
  First Second  Value
0  KAT1      E     20
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
8  KAT4      E     25

print (df[(~m1 & ~m2)])
  First Second  Value
2  KAT2      M     25

print (df[(m1 & m2) | (~m1 & ~m2)])
  First Second  Value
0  KAT1      E     20
2  KAT2      M     25
3  KAT3      E     26
4  KAT3      E     24
5  KAT4      E     19
8  KAT4      E     25

Group Pandas dataframe by two columns and output the maximum column value indication to new column

1 Answers1