All,
I was able to find a function called "combine_first()" in the pandas documentation as well as stackoverflow. This works great for only a few logical example. I was able to get the below to work combining the "combine_first()" function multiple times (in this case 6). Can someone assist in finding a more elegant solution?
The result of the created variable "category_id" should contain the first non-missing value starting with the last variable (category_id7) and ascending to the first. If category_id(x) is populated, category_id should take that value and stop processing for every row in the dataframe.
d={'category_id1':[32991,32991,32991,32991,32991],
'category_id2':[22,22,22,22,22],
'category_id3':[33058,51,121,120,32438],
'category_id4':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id5':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id6':[np.nan,np.nan,np.nan,np.nan,np.nan],
'category_id7':[np.nan,np.nan,np.nan,np.nan,np.nan]
}
df=pd.DataFrame(data=d)
df['category_id']=df.category_id7.combine_first(df.category_id6).combine_first(df.category_id5).combine_first(df.category_id4).combine_first(df.category_id3).combine_first(df.category_id2).combine_first(df.category_id1)
print(df)