0

I have the following dataframe:

pandas.DataFrame(numpy.random.randn(10, 5) > 1, index=range(1, 11), columns=list('ABCDE'))

        A      B      C      D      E
1   False  False  False  False  False
2   False  False  False  False  False
3    True   True  False   True  False
4   False  False   True   True  False
5   False  False  False  False  False
6   False  False  False  False  False
7   False  False  False  False  False
8   False  False  False  False  False
9   False  False  False  False  False
10  False   True  False   True  False

For each row I would like to get the column name that is the last one in that row containing True. If there isn't any, return any resonable value. How can I do that?

user1942586
  • 153
  • 1
  • 3
  • 16
  • you need to re-examine your example code. `list('ABCDE')` ?? – JD Long Nov 13 '14 at 21:11
  • @JDLong what's wrong with that? Calling `list()` on a string returns each character in the string individually, which is what is desired here. It's easier than writing `columns=['A', 'B', 'C', 'D', 'E']`. – MattDMo Nov 13 '14 at 21:21
  • doesnt on my box `TypeError: 'list' object is not callable` – JD Long Nov 13 '14 at 21:21
  • 1
    sorry about that... i had a list named list in my workspace. Oops! – JD Long Nov 14 '14 at 14:27

2 Answers2

0

set up the example data first:

np.random.seed(1)
df = pd.DataFrame( (randn(10, 5) > 1) , index=range(1, 11), columns=['A','B','C','D','E'])
df

looks like:

        A      B      C      D      E
1    True  False  False  False  False
2   False   True  False  False  False
3    True  False  False  False   True
4   False  False  False  False  False
5   False   True  False  False  False
6   False  False  False  False  False
7   False  False  False  False  False
8   False  False  False   True  False
9   False  False  False   True  False
10  False  False   True  False  False

it sounds like what you want to do is get the index # for each true value and then select the max index #. On a single column that might look like the following:

df['A'][df['A']].index.max()

which returns 3. To do this for all the columns, the easiest is to iterate through each column and shove the result in a list:

mylist = []
for col in df.columns:
    myval = df[col][df[col]].index.max()
    mylist.append(myval)

mylist

that returns:

[3, 5, 10, 9, 3]

the loop logic above returns nan if there is no True value in the column.

JD Long
  • 59,675
  • 58
  • 202
  • 294
0

A one liner:

>>> value = np.nan
>>> df.reindex_axis(df.columns[::-1], axis=1)\  # flip vertically
      .idxmax(axis=1)\                          # find last(now first) True value 
      .reset_index()\                           # get index for the next step
      .apply(lambda x: value if (x[0]==df.columns[-1] and not df.ix[x['index'], x[0]]) 
                         else x[0], axis=1)     # =value if col=="E" and value==False
Out [1]:
0    NaN
1    NaN
2      D
3      D
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9      D

Explanation:

idxmax returns the index of the max value in a row, if there is more than one max it returns the first one. We want the last one so we flip the dataframe vertically.

Finally we must replace the obtained Series with value if col=="E" and value==False. You can't apply a condition on the index of a Series, thats why you need the reset_index first.

This last step could be more elegantly done with df.replace({'E': {False: value}), which replaces False in column 'E' with value, but somehow it doesn't work for me.

elyase
  • 39,479
  • 12
  • 112
  • 119
  • Very nice. At bit complicated to understand but works. – user1942586 Nov 27 '14 at 13:57
  • Yep, pandas should have an easier way to do such things, like applying a condition on an index, see [here](http://stackoverflow.com/questions/18316211/access-index-in-pandas-series-apply). – elyase Nov 27 '14 at 18:02