1

I have a pandas df with multiple rows, 5k+ and approximately 10 columns True/False. In each of the rows, only one of the column's entries will be True and the remaining 9 false.

# Import library
import pandas as pd

# Create dictionary and convert to pd DF
test = {"col1":[True, False, True, True, False],
        "col2":[False, True, False, False, True]}

test = pd.DataFrame(test)

# Show case a dataframe
print(test)

The dataframe should look like

    col1    col2
0   True    False
1   False   True
2   True    False
3   True    False
4   False   True|

I am hoping to return an array with the following values:

output_array = ['col1','col2','col1','col1','col2']

I'm stuck and I know I should probably use some sort of apply method and index the 10 columns, but I am not sure on the best way to screen the subset of elements of a row for True and return the column. Any help much appreciated and thank you!

Seanycase
  • 11
  • 2

1 Answers1

2
true_col_name = test.idxmax(axis=1)

will give you a Series of which column name has the True value, assuming that there is in fact exactly one True value per row.

In [6]: test.idxmax(axis=1)
Out[6]: 
0    col1
1    col2
2    col1
3    col1
4    col2
dtype: object
Ferris
  • 5,325
  • 1
  • 14
  • 23
Steele Farnsworth
  • 863
  • 1
  • 6
  • 15
  • note that it will work even if there are multiple `True` values. the tie breaker will just go to the leftmost `True`. – tdy Jun 29 '21 at 03:25
  • how can I get all the columns (multiple some times) with True values? – user77005 Aug 06 '23 at 06:56