Combine a span of columns in Pandas based on a condition (exclude nans)

Question

Let's say I have a dataframe like this:

   a   b   c   d 
0  S   t   f   nan
1  S   t   t   nan
2  S   f   nan nan
3  Q   t   nan nan

I want to combine the last 3 columns into a single column, as an array, but exclude the nan values, so I end up getting something like the following:

   a   b   c   d   e
0  S   t   f   nan [t, f]
1  S   t   t   nan [t, f]
2  S   f   nan nan [f]
3  Q   t   nan nan [t]

The closest I was able to get was using iloc but I'm unable to apply a conditional to it properly:

df['e'] = df.iloc[:, 1:].values.tolist()

The above results in the arrays having all the column values, including nans.

The various solutions I've come across either have strings and not numbers, or require specifying each column individually. But since I have over a dozen columns, I was wondering if there was a cleaner way that lets me specify a span. — protommxx, May 10 '22 at 01:33

score 2 · Accepted Answer · answered May 10 '22 at 01:41

2

You could use a nested list comprehension where you use the fact that NaN is not equal to itself to filter out NaNs:

df['e'] = [[x for x in ary if x==x] for ary in df.iloc[:,-3:].to_records(index=False)]

Output:

   a  b    c   d       e
0  S  t    f NaN  [t, f]
1  S  t    t NaN  [t, t]
2  S  f  NaN NaN     [f]
3  Q  t  NaN NaN     [t]

answered May 10 '22 at 01:41

1

What!? NaN not equal NaN? That's some crazy talk right there. That's a valuable tool to have in the tool belt. Too cool. – jch May 10 '22 at 01:54
@jch yeah, you can read more [here](https://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values/1573715#1573715). – May 10 '22 at 02:18

score 0 · Answer 2 · answered May 10 '22 at 02:10

0

IMHO a bit more readable version:

df['new_col_name'] = df.iloc[:,-3:].apply(lambda ser: ser.dropna().to_list(), axis=1)

answered May 10 '22 at 02:10

Anton Frolov

2 Answers2