How to select column names in python pandas by dataframe values?

Question

I have the following dataframe:

pandas.DataFrame(numpy.random.randn(10, 5) > 1, index=range(1, 11), columns=list('ABCDE'))

        A      B      C      D      E
1   False  False  False  False  False
2   False  False  False  False  False
3    True   True  False   True  False
4   False  False   True   True  False
5   False  False  False  False  False
6   False  False  False  False  False
7   False  False  False  False  False
8   False  False  False  False  False
9   False  False  False  False  False
10  False   True  False   True  False

For each row I would like to get the column name that is the last one in that row containing True. If there isn't any, return any resonable value. How can I do that?

you need to re-examine your example code. `list('ABCDE')` ?? — JD Long, Nov 13 '14 at 21:11
@JDLong what's wrong with that? Calling `list()` on a string returns each character in the string individually, which is what is desired here. It's easier than writing `columns=['A', 'B', 'C', 'D', 'E']`. — MattDMo, Nov 13 '14 at 21:21
sorry about that... i had a list named list in my workspace. Oops! — JD Long, Nov 14 '14 at 14:27

score 0 · Answer 1 · answered Nov 13 '14 at 21:24

set up the example data first:

np.random.seed(1)
df = pd.DataFrame( (randn(10, 5) > 1) , index=range(1, 11), columns=['A','B','C','D','E'])
df

looks like:

        A      B      C      D      E
1    True  False  False  False  False
2   False   True  False  False  False
3    True  False  False  False   True
4   False  False  False  False  False
5   False   True  False  False  False
6   False  False  False  False  False
7   False  False  False  False  False
8   False  False  False   True  False
9   False  False  False   True  False
10  False  False   True  False  False

it sounds like what you want to do is get the index # for each true value and then select the max index #. On a single column that might look like the following:

df['A'][df['A']].index.max()

which returns 3. To do this for all the columns, the easiest is to iterate through each column and shove the result in a list:

mylist = []
for col in df.columns:
    myval = df[col][df[col]].index.max()
    mylist.append(myval)

mylist

that returns:

[3, 5, 10, 9, 3]

the loop logic above returns nan if there is no True value in the column.

I wanted the same thing but per row and not column. Meaning `['A', 'B', 'E', NaN, 'B', NaN, 'Nan', 'D', 'D', 'C']` — user1942586, Nov 14 '14 at 10:31
Well, with a transpose on the original frame and the result this actually works — user1942586, Nov 27 '14 at 13:57

elyase · Accepted Answer · 2014-11-13T21:56:28.273

A one liner:

>>> value = np.nan
>>> df.reindex_axis(df.columns[::-1], axis=1)\  # flip vertically
      .idxmax(axis=1)\                          # find last(now first) True value 
      .reset_index()\                           # get index for the next step
      .apply(lambda x: value if (x[0]==df.columns[-1] and not df.ix[x['index'], x[0]]) 
                         else x[0], axis=1)     # =value if col=="E" and value==False
Out [1]:
0    NaN
1    NaN
2      D
3      D
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9      D

Explanation:

idxmax returns the index of the max value in a row, if there is more than one max it returns the first one. We want the last one so we flip the dataframe vertically.

Finally we must replace the obtained Series with value if col=="E" and value==False. You can't apply a condition on the index of a Series, thats why you need the reset_index first.

This last step could be more elegantly done with df.replace({'E': {False: value}), which replaces False in column 'E' with value, but somehow it doesn't work for me.

Yep, pandas should have an easier way to do such things, like applying a condition on an index, see [here](http://stackoverflow.com/questions/18316211/access-index-in-pandas-series-apply). — elyase, Nov 27 '14 at 18:02

How to select column names in python pandas by dataframe values?

2 Answers2