0

Let's say I have a dataframe like this:

   a   b   c   d 
0  S   t   f   nan
1  S   t   t   nan
2  S   f   nan nan
3  Q   t   nan nan

I want to combine the last 3 columns into a single column, as an array, but exclude the nan values, so I end up getting something like the following:

   a   b   c   d   e
0  S   t   f   nan [t, f]
1  S   t   t   nan [t, f]
2  S   f   nan nan [f]
3  Q   t   nan nan [t]

The closest I was able to get was using iloc but I'm unable to apply a conditional to it properly:

df['e'] = df.iloc[:, 1:].values.tolist()

The above results in the arrays having all the column values, including nans.

Amir Charkhi
  • 768
  • 7
  • 23
protommxx
  • 57
  • 1
  • 9
  • The various solutions I've come across either have strings and not numbers, or require specifying each column individually. But since I have over a dozen columns, I was wondering if there was a cleaner way that lets me specify a span. – protommxx May 10 '22 at 01:33

2 Answers2

2

You could use a nested list comprehension where you use the fact that NaN is not equal to itself to filter out NaNs:

df['e'] = [[x for x in ary if x==x] for ary in df.iloc[:,-3:].to_records(index=False)]

Output:

   a  b    c   d       e
0  S  t    f NaN  [t, f]
1  S  t    t NaN  [t, t]
2  S  f  NaN NaN     [f]
3  Q  t  NaN NaN     [t]
  • 1
    What!? NaN not equal NaN? That's some crazy talk right there. That's a valuable tool to have in the tool belt. Too cool. – jch May 10 '22 at 01:54
  • @jch yeah, you can read more [here](https://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values/1573715#1573715). –  May 10 '22 at 02:18
0

IMHO a bit more readable version:

df['new_col_name'] = df.iloc[:,-3:].apply(lambda ser: ser.dropna().to_list(), axis=1)
Anton Frolov
  • 112
  • 1
  • 9