This is largely a follow up question from this one.
Combine Pandas data frame column values into new column
I want to combine certain columns (not all) of a dateframe into a single column and pick which data to use in certain order of preference and ignore rest.
For instance I have this
df=pd.DataFrame({'A': ['alpha','beta', np.nan, np.nan],
'B':['alpha_1',np.nan, np.nan, 'delta_1'],
'C': [np.nan,'beta_2', 'gamma_2', 'delta_2']})
which gives
A B C
0 alpha alpha_1 NaN
1 beta NaN beta_2
2 NaN NaN gamma_2
3 NaN delta_1 delta_2
I want to combine them such that it will first prefer value in column A
and if that is NaN
or null
, it will look for value in column B
and if not finally in column C
.
so the final output in Column D
will be like
A B C D
0 alpha alpha_1 NaN alpha
1 beta NaN beta_2 beta
2 NaN NaN gamma_2 gamma_2
3 NaN delta_1 delta_2 delta_1
I can do df.fillna('').sum(axis=1)
as suggested in the solution but I am thinking before that I should make sure every row has only non NaN
value.
I am trying with boolean masking using isnull()
on each column but not able to progress too far on that. Any help would be great.