I am struggling to get the top n (n=3 in my case) columns while ignoring NaNs. My dataset:
import numpy as np
import pandas as pd
x = {'ID':['1','2','3','4','5'],
'productA':[0.47, 0.65, 0.48, 0.58, 0.67],
'productB':[0.65,0.47,0.55, np.NaN, np.NaN],
'productC':[0.78, np.NaN, np.NaN, np.NaN, np.NaN],
'productD':[np.NaN, np.NaN, 0.25, np.NaN, np.NaN],
'productE':[0.12, np.NaN, 0.47, 0.12, np.NaN]}
df = pd.DataFrame(x)
My desired outcome:
ID | top3 |
---|---|
A1 | productC - productB - productA |
A2 | productA - productB |
A3 | productB - productA- productE |
A4 | productA - productE |
A5 | productA |
As you can see, if n < 3, it should retain whatever n is but sorted by their values. I tried np.argsort but it doesn't ignore the NaNs and instead sorts the missing products by alphabetical order.