I have a dataframe called 'main_df' that contains 3 columns X,Y,Z.
X Y Z
NaN NaN ZVal1
NaN NaN ZVal2
XVal1 NaN NaN
NaN YVal1 NaN
Each column carries data of the specific type (X,Y,Z) so if there is data in column X for a particular row, there will be no data in columns Y/Z because it is not of type X.
If you combine all 3 columns, they "slide into each other" neatly and you will get values all the way down.
How can I combine these 3 columns into a 4th column so that NaN values are ignored and we just get whatever single value exists in the 3 columns for that row?
Expected output:
X Y Z XYZ
NaN NaN ZVal1 ZVal1
NaN NaN ZVal2 ZVal2
XVal1 NaN NaN XVal1
NaN YVal1 NaN YVal1
Dataframe code:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['X', 'Y', 'Z'], data=[[np.NaN, np.NaN, 'ZVal1'], [np.NaN, np.NaN, 'ZVal2'], ['XVal1', np.NaN, np.NaN], [np.NaN,'YVal1' ,np.NaN]])
Right now I am trying to do something along the lines of:
df['XYZ'] = df['X'].astype(str) + df['Y'].astype(str) + df['Z'].astype(str) but that combines the NaN values into one long string