How to combine multiple dataframe columns into one given each column has nan values

Question

I have a dataframe called 'main_df' that contains 3 columns X,Y,Z.

X        Y       Z
NaN      NaN     ZVal1
NaN      NaN     ZVal2
XVal1    NaN     NaN
NaN      YVal1   NaN

Each column carries data of the specific type (X,Y,Z) so if there is data in column X for a particular row, there will be no data in columns Y/Z because it is not of type X.

If you combine all 3 columns, they "slide into each other" neatly and you will get values all the way down.

How can I combine these 3 columns into a 4th column so that NaN values are ignored and we just get whatever single value exists in the 3 columns for that row?

Expected output:

X        Y       Z           XYZ
NaN      NaN     ZVal1       ZVal1
NaN      NaN     ZVal2       ZVal2
XVal1    NaN     NaN         XVal1    
NaN      YVal1   NaN         YVal1

Dataframe code:

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(columns=['X', 'Y', 'Z'], data=[[np.NaN, np.NaN, 'ZVal1'], [np.NaN, np.NaN, 'ZVal2'], ['XVal1', np.NaN, np.NaN], [np.NaN,'YVal1' ,np.NaN]])

Right now I am trying to do something along the lines of:

df['XYZ'] = df['X'].astype(str) + df['Y'].astype(str) + df['Z'].astype(str) but that combines the NaN values into one long string

https://stackoverflow.com/questions/31828240/first-non-null-value-per-row-from-a-list-of-pandas-columns — Chris, Jun 16 '21 at 16:16

Mustafa Aydın · Accepted Answer · 2021-06-16T16:29:41.397

With stack:

df["XYZ"] = df.stack().values

to get

>>> df

       X      Y      Z    XYZ
0    NaN    NaN  ZVal1  ZVal1
1    NaN    NaN  ZVal2  ZVal2
2  XVal1    NaN    NaN  XVal1
3    NaN  YVal1    NaN  YVal1

since you guarantee only 1 non-NaN per row and stack drops NaNs by default.

Another way with fancy indexing:

df["XYZ"] = df.to_numpy()[np.arange(len(df)),
                          df.columns.get_indexer(df.notna().idxmax(axis=1))]

which, for each row, looks at the index of the non-NaN value and selects it.

score 1 · Answer 2 · answered Jun 16 '21 at 16:11

1

Try:

df["XYZ"] = df.apply(lambda x: x[x.notna()][0], axis=1)
print(df)

Prints:

       X      Y      Z    XYZ
0    NaN    NaN  ZVal1  ZVal1
1    NaN    NaN  ZVal2  ZVal2
2  XVal1    NaN    NaN  XVal1
3    NaN  YVal1    NaN  YVal1

Or:

df["XYZ"] = df.bfill(axis=1)["X"]
print(df)

answered Jun 16 '21 at 16:11

Andrej Kesely

168,389
15
48
91

2

Sir I already posted the second method **:)** – Anurag Dabas Jun 16 '21 at 16:14
@AnuragDabas I think we are on the same wave :) – Andrej Kesely Jun 16 '21 at 16:16

How to combine multiple dataframe columns into one given each column has nan values

2 Answers2