0

I have a dataframe like the one below where all IDs are unique and columns A, B and C hold values between 0 and 1.

df = pd.DataFrame({'A':[1, 0.7, 0, 0.5, 0.3, 0.3], 'B' :[0.6, 0.1, 0.4, 0.3, 0.9, 0.3], 'C':[0.6, 0.3, 0.6, 0.8, 0.9, 0.5], 'ID':['a', 'b', 'c', 'd', 'e', 'f']} )
A B C ID
1 0.6 0.6 a
0.7 0.1 0.3 b
0 0.4 0.6 c
0.5 0.3 0.8 d
0.3 0.9 0.9 e
0.3 0.3 0.5 f

I want to keep just the top n values of A, B and C, so that for n = 2 the dataframe looks as follow:

A B C ID
1 0.6 NaN a
0.7 NaN NaN b
NaN NaN 0.8 d
NaN 0.9 0.9 e

Doing df.set_index('ID')['A'].nlargest(2).reset_index() gives me:

ID A
a 1
b 0.7

Is there a simpler way than doing this 3 times and joining the datasets?

Sam
  • 79
  • 1
  • 7
  • Please read this: [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) e.g. use `df[['tags', 'elements']].iloc[:5].to_dict()` – Andreas Aug 25 '21 at 14:34

1 Answers1

3

Try with nlargest

out = df.set_index('ID').apply(lambda x : x.nlargest(n=2))
Out[654]: 
      A    B    C
ID               
a   1.0  0.6  NaN
b   0.7  NaN  NaN
d   NaN  NaN  0.8
e   NaN  0.9  0.9
BENY
  • 317,841
  • 20
  • 164
  • 234