-1

I have a dataframe where I would like to find max value of each row between 1 and 5 values.

Condition 1 to 5: Columns -> A1, B1, C1, D1, E1, F1, G1, H1, J1

Find Max_Value: Columns -> A2, B2, C2, D2, E2, F2, G2, H2, J2

Max_Value (Column) is the expected output on the screenshot.

You may see the attached screenshot to understand the question better. Thanks in advance for your support!

enter image description here

import pandas as pd
df = pd.DataFrame({'Type': ['X', 'Y', 'Z'], 'ID': [1212, 2342, 2421], 
'A1': [1, 1, 1], 'A2': [0.3, 0.004, 0.86], 'B1': [2, 1.5, 3],       
'B2': [0.1, 0.8, 0.005], 'C1': [3, 2, 5], 'C2': [0.8, 0.2, 0.3],
'D1': [4, 2.5, 7], 'D2': [0.2, 0.4, 0.004], 'E1': [5, 3, 9],        
'E2': [0.9, 0.005, 0.99], 'F1': [6, 3.5, 11], 'F2': [0.4, 0.3, 0.4],
'G1': [7, 5, 13], 'G2': [0.6, 0.85, 0.003], 'H1': [8, 4.5, 15],     
'H2': [0.94, 0.088, 0.85], 'J1': [9, 5, 17], 'J2': [0.003, 0.0001,0.3]})
print(df)
Kahraman
  • 33
  • 8
  • have a look at [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Anurag Dabas Aug 02 '21 at 13:32

2 Answers2

1

You can use the pd.DataFrame.max function with the parameter axis=1.

    import pandas as pd
    df = pd.DataFrame({'a': [2,4,6], 'b': [3,4,5], 'c': [1,7,2]})

    # Select first 5 columns
    df = df.iloc[: , :5]

    # Get max per row
    maxValuesObj = pd.DataFrame({'max': df.max(axis=1)})
    
    # Assign column values
    maxValuesObj['column'] = df.columns.values
    
    print(maxValuesObj)

        | max    | column
    0   | 3      | a
    1   | 7      | b
    2   | 6      | c
jdub
  • 170
  • 10
  • Edited to properly take first 5 columns – jdub Aug 02 '21 at 13:59
  • Thanks for the solution. Unfortunately, It does not solve the problem. Because, respectively A1&A2, B1&B2... are a group. So, I would like to find the highest decimal value between 1 and 5 values. Pay attention that 1 is always on the column A1 but the value 5 is not under a specific column. This makes the question a little bit tricky. You may have a look at again to the screenshot that I have attached above. Example: For the first row, the search should be done between A1 and E2 columns and then find the highest decimal value. – Kahraman Aug 02 '21 at 20:09
1

Can you try the code below:

mask = df.filter(regex=r'[A-Z]1').stack().between(1, 5).values
df['MAX'] = df.filter(regex=r'[A-Z]2').stack()[mask].groupby(level=0).max()

The expected result for the first row: 0,9 (1 and 5 values are between A1 and E2). The expected result for the second row: 0,85 (1 and 5 values are between A1 and J2). The expected result for the third row: 0,86 (1 and 5 values are between A1 and C2)

>>> df
  Type    ID  A1     A2  ...  J1      J2   MAX
0    X  1212   1  0.300  ...   9  0.0030  0.90
1    Y  2342   1  0.004  ...   5  0.0001  0.85
2    Z  2421   1  0.860  ...  17  0.3000  0.86

[3 rows x 21 columns]
Corralien
  • 109,409
  • 8
  • 28
  • 52