2

I have a dataframe called 'main_df' that contains 3 columns X,Y,Z.

X        Y       Z
NaN      NaN     ZVal1
NaN      NaN     ZVal2
XVal1    NaN     NaN
NaN      YVal1   NaN

Each column carries data of the specific type (X,Y,Z) so if there is data in column X for a particular row, there will be no data in columns Y/Z because it is not of type X.

If you combine all 3 columns, they "slide into each other" neatly and you will get values all the way down.

How can I combine these 3 columns into a 4th column so that NaN values are ignored and we just get whatever single value exists in the 3 columns for that row?

Expected output:

X        Y       Z           XYZ
NaN      NaN     ZVal1       ZVal1
NaN      NaN     ZVal2       ZVal2
XVal1    NaN     NaN         XVal1    
NaN      YVal1   NaN         YVal1   

Dataframe code:

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(columns=['X', 'Y', 'Z'], data=[[np.NaN, np.NaN, 'ZVal1'], [np.NaN, np.NaN, 'ZVal2'], ['XVal1', np.NaN, np.NaN], [np.NaN,'YVal1' ,np.NaN]])

Right now I am trying to do something along the lines of:

df['XYZ'] = df['X'].astype(str) + df['Y'].astype(str) + df['Z'].astype(str) but that combines the NaN values into one long string

  • https://stackoverflow.com/questions/31828240/first-non-null-value-per-row-from-a-list-of-pandas-columns – Chris Jun 16 '21 at 16:16

2 Answers2

3

With stack:

df["XYZ"] = df.stack().values

to get

>>> df

       X      Y      Z    XYZ
0    NaN    NaN  ZVal1  ZVal1
1    NaN    NaN  ZVal2  ZVal2
2  XVal1    NaN    NaN  XVal1
3    NaN  YVal1    NaN  YVal1

since you guarantee only 1 non-NaN per row and stack drops NaNs by default.


Another way with fancy indexing:

df["XYZ"] = df.to_numpy()[np.arange(len(df)),
                          df.columns.get_indexer(df.notna().idxmax(axis=1))]

which, for each row, looks at the index of the non-NaN value and selects it.

Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
1

Try:

df["XYZ"] = df.apply(lambda x: x[x.notna()][0], axis=1)
print(df)

Prints:

       X      Y      Z    XYZ
0    NaN    NaN  ZVal1  ZVal1
1    NaN    NaN  ZVal2  ZVal2
2  XVal1    NaN    NaN  XVal1
3    NaN  YVal1    NaN  YVal1

Or:

df["XYZ"] = df.bfill(axis=1)["X"]
print(df)
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91