I have a dataframe like this:
df = pd.DataFrame({'asset_id': [10,10, 10, 20, 20, 20], 'method_id': ['p2','p3','p4', 'p3', 'p1', 'p2'], 'method_rank': [5, 2, 2, 2, 5, 1], 'conf_score': [0.8, 0.6, 0.8, 0.9, 0.7, 0.5]} , columns= ['asset_id', 'method_id','method_rank', 'conf_score'])
It looks like this:
asset_id method_id method_rank conf_score
0 10 p2 5 0.8
1 10 p3 2 0.6
2 10 p4 2 0.8
3 20 p3 2 0.9
4 20 p1 5 0.7
5 20 p2 1 0.5
I want to group the rows by asset id, and then give each row an overall rank based on method_rank
ascending and conf_score
descending.
ie. I want the result to look like this:
asset_id method_id method_rank conf_score overall_rank
5 20 p2 1 0.5 1.0
3 20 p3 2 0.9 2.0
2 10 p4 2 0.8 1.0
1 10 p3 2 0.6 2.0
0 10 p2 5 0.8 3.0
4 20 p1 5 0.7 3.0
How can I do this using group by and ranking in pandas? It looks like in pandas you can only do it based on one column, like
df["overall_rank"] = df.groupby('asset_id')['method_rank'].rank("first")
But I want to achieve something like
df["overall_rank"] = df.groupby('asset_id')[['method_rank', 'conf_score']].rank("first", ascending = [True, False])
How do I do this? I am aware that a hacky way is to first use sort_values
on the entire dataframe and then do groupby
, but sorting the rows of the entire dataframe seems too expensive when I only want to sort a few rows in each group.