0

I have a dataset that looks something like this:

col_1 col_2 col_3
0 nan nan
nan 1 nan
nan nan nan
nan nan 1

And I need to collapse those columns into something like this:

foo
0
1
nan
1

My first attempt was doing this:

df[columns].values[~df[columns].isna()]

But since there are rows where all the values are nan, I miss those rows.

My second attempt was doing this:

def get_cols_or_nan(row):
    mask = ~row.isna()
    if np.any(mask):
        return row[mask][0]
    
    return float('nan')
    

df[columns].apply(get_cols_or_nan, axis=1)

But, I don't know why, this is significantly slower to the point that for me it's impracticable.

Is there a more efficient way that I can collapse those columns? I am guaranteed that there's only one non-nan value in each row.

João Areias
  • 1,192
  • 11
  • 41
  • what should happen if you also have a number in the first row for col2? – mozway May 11 '22 at 11:58
  • I hadn't seen the last comment, then it's covered in [here](https://stackoverflow.com/questions/31828240/first-non-null-value-per-row-from-a-list-of-pandas-columns) – mozway May 11 '22 at 12:03

2 Answers2

2

If you have at most one non-NA value per row, you can use:

df.stack().droplevel(1).reindex_like(df)

output:

0    0.0
1    1.0
2    NaN
3    1.0
dtype: float64
mozway
  • 194,879
  • 13
  • 39
  • 75
0

You can find the max number in rows with df.max(axis=1)

out = df.max(axis=1).to_frame('foo')
print(out)

   foo
0  0.0
1  1.0
2  NaN
3  1.0
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52