How to return max value from a row from pandas dataframe taking into account values from the last row?

Question

Currently I'm returning column name of the max value in the each row.


    df['Active'] = df.idxmax(axis=1)

enter image description here

How do I take into account the Priority for each column? e.g. for Row 0, the Active column should have opC since it has a higher priority than opA. (Also Priority row shouldn't return anything in the Active column).

Update: Follow up scenario. Adding an additional row called 'minOccurrence'. Here's an example of it. Since opD doesn't have 3 straight "Actives" it isn't active at index 1 or 2 where previously it was Active based on 'Priority' column only.

df1 = pd.DataFrame({'opA': [1,1,1,1,0], 
               'opB': [1,1,1,0,1],
                'opC': [1,1,1,1,2], 
               'opD': [0,1,1,0,3],
               'Active': ['opC','opD', 'opD', 'opC', 0]})
df1 = df1.rename(index={df1.last_valid_index() : 'Priority'})
df1.loc['Priority','Active'] = ''
print(df1)

df1 = pd.DataFrame({'opA': [1,1,1,1,0,0], 
               'opB': [1,1,1,0,1,0],
                'opC': [1,1,1,1,2,0], 
               'opD': [0,1,1,0,3,3],
               'Active': ['opC','opC', 'opC', 'opC', 0,0]})
df1 = df1.rename(index={df1.last_valid_index() - 1 : 'Priority'})
df1 = df1.rename(index={df1.last_valid_index() : 'minOccurrence'})
df1.loc['Priority','Active'] = ''
df1.loc['minOccurrence','Active'] = ''
print(df1)

vs. if opD had a 1 at index 0.

df1 = pd.DataFrame({'opA': [1,1,1,1,0,0], 
               'opB': [1,1,1,0,1,0],
                'opC': [1,1,1,1,2,0], 
               'opD': [1,1,1,0,3,3],
               'Active': ['opD','opD', 'opD', 'opC', 0,0]})
df1 = df1.rename(index={df1.last_valid_index() - 1 : 'Priority'})
df1 = df1.rename(index={df1.last_valid_index() : 'minOccurrence'})
df1.loc['Priority','Active'] = ''
df1.loc['minOccurrence','Active'] = ''
print(df1)

score 1 · Answer 1 · answered Apr 01 '22 at 00:14

1

You need to resort the columns before using idxmax

temp_cols = df.columns
df = df.sort_index(axis=1,key=lambda x:df.loc['Priority',x],ascending=False)
df['Active'] = df.idxmax(axis=1)
df = df[list(temp_cols)+['Active']]
df.loc['Priority','Active'] = ''

answered Apr 01 '22 at 00:14

Arnau

741
1
4
8

Thanks! Can you help with my updated question? @Arnau – Madhu Apr 01 '22 at 23:52

score 0 · Answer 2 · answered Apr 01 '22 at 00:16

0

multiply column index by row column value , then pick up maximum result and sum all the row values , put it in new column , sort column.

answered Apr 01 '22 at 00:16

Meelad Ghazipour

36
3

mozway · Answer 3 · 2022-04-01T01:48:45.527

0

You can do everything in a single shot using indexing.

Using multiplication by the priority as suggested by @Meelad:

df['Active'] = (df
               .loc[df.index!='Priority']
               .mul(df.loc['Priority'])
               .idxmax(1)
               )

Or by sorting the columns as suggested by @Arnau:

df['Active'] = (df
                .loc[df.index!='Priority']
                .sort_index(axis=1, key=lambda x: -df.loc['Priority',x])
                .idxmax(1)
                )

Reproducible input:

np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,2,(13,4)),
                  columns=['opA', 'opB', 'opC', 'opD'])
df.loc['Priority'] = range(4)

Output:

          opA  opB  opC  opD Active
0           0    1    1    0    opC
1           1    1    1    1    opD
2           1    1    1    0    opC
3           0    1    0    0    opB
4           0    0    0    1    opD
5           0    1    1    0    opC
6           0    1    1    1    opD
7           1    0    1    0    opC
8           1    0    1    1    opD
9           0    1    1    0    opC
10          0    1    0    1    opD
11          1    1    1    1    opD
Priority    0    1    2    3    NaN

edited Apr 01 '22 at 01:48

answered Apr 01 '22 at 01:39

mozway

194,879
13
39
75

Thank you! Following up, if I had an additional row "minOccurrence" with |0|2|1|3| values meaning a column can only be active if it has that many (2, 1, 3) consecutive 1's. How would I approach this? – Madhu Apr 01 '22 at 01:42
This condition is unclear, can you provide an input output example? Please use text or a DataFrame constructor, not images. – mozway Apr 01 '22 at 01:44
Edit the question, don't use comments, this is impossible to read properly – mozway Apr 01 '22 at 01:47
I updated my answer with an example – mozway Apr 01 '22 at 01:49
Just updated my question. Appreciate it! – Madhu Apr 01 '22 at 01:52
But you used an image, not text, and you haven't provided the expected output. – mozway Apr 01 '22 at 01:53
Also the logic is still unclear. Seems the count is greater than 3 for the rows you mention. Which dimension are you talking about? – mozway Apr 01 '22 at 01:58
Back to the request of a [reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Apr 01 '22 at 02:04
Ah sorry. Let me know if it makes sense now. – Madhu Apr 01 '22 at 15:54
@madhu looks like you need to use `rolling` to identify the consecutive 1s – mozway Apr 01 '22 at 18:56
How do I determine the active column while using rolling to identify the consecutive 1s? @mozway – Madhu Apr 02 '22 at 15:45
trying df = df.sort_index(axis=1,key=lambda x:df.rolling(df.loc['minOccurrence',x]),ascending=False) but getting "window must be an integer 0 or greater" – Madhu Apr 02 '22 at 15:58

How to return max value from a row from pandas dataframe taking into account values from the last row?

3 Answers3