I have a Pandas dataframe, which I need to group by two different columns to check which value in a column is the highest. However, if the value in the first choice exists or is already the highest, there is no need to check the second subgroup. I have looked into Get the row(s) which have the max value in groups using groupby, but I have to make some additional checks, which I have not been able to do.
Example:
df = pd.DataFrame({
'First': ['KAT1', 'KAT1', 'KAT2', 'KAT3', 'KAT3', 'KAT4', 'KAT4', 'KAT4', 'KAT4'],
'Second': ['E', 'M', 'M', 'E', 'E', 'E', 'M', 'M', 'E'],
'Value': [20, 28, 25, 26, 24, 19, 23, 24, 25]
})
df
First Second Value
0 KAT1 E 20
1 KAT1 M 28
2 KAT2 M 25
3 KAT3 E 26
4 KAT3 E 24
5 KAT4 E 19
6 KAT4 M 23
7 KAT4 M 24
8 KAT4 E 25
First, it would need to group by column 'First' and then by 'Second' with preference to value 'E'. Then find the maximum value in that subgroup. If 'E' does not exist, it would need to check 'M', and find the maximum value in that subgroup. Values can be tied, both would then be written to the new column as True.
Expected output:
First Second Value Ismax
0 KAT1 E 20 True
1 KAT1 M 28 False
2 KAT2 M 25 True
3 KAT3 E 26 True
4 KAT3 E 24 False
5 KAT4 E 19 False
6 KAT4 M 23 False
7 KAT4 M 24 False
8 KAT4 E 25 True