1

Here is an example of my dataframe (my actual dataframe has 20+ columns and 100+ rows)

df = pd.DataFrame([['Jim', 93, 87, 66], 
                   ['Bob', 88, 90, 65],
                   ['Joe', 72, 100, 70]],
                   columns=['Name', 'Score1', 'Score2', 'Score3'])


Name  Score1  Score2  Score3
Jim       93      87      66
Bob       88      90      65
Joe       72     100      70

I want to create a new table which shows the rank of each score in a column. For example, the desired output would be:

Name  Score1  Score2  Score3
Jim        1       3       2
Bob        2       2       3
Joe        3       1       1

Is it possible to achieve this in pandas by looping through every column?

  • What I can think right now is sort dataframe by a colum as in https://stackoverflow.com/a/53302774/9475509 and then find index of some person, store it, and do that again with other columns. – dudung Mar 06 '23 at 10:55

2 Answers2

1

You can use filter to get the Score columns, then rank with a dense method in descending order, finally combine_first the other columns:

out = (df.filter(like='Score')
         .rank(method='dense', ascending=False)
         .convert_dtypes()  # optional, to have integers
         .combine_first(df)
       )

Output:

  Name  Score1  Score2  Score3
0  Jim       1       3       2
1  Bob       2       2       3
2  Joe       3       1       1
mozway
  • 194,879
  • 13
  • 39
  • 75
1

Yes, it is possible to achieve the desired output in Pandas without looping through every column. One approach is to use the rank method in Pandas to rank the values in each column separately.

# create a new dataframe to store the rankings
rank_df = pd.DataFrame(df['Name'])

# loop through each score column and rank the values
for col in df.columns[1:]:
    rank_df[col] = df[col].rank(ascending=False)
David
  • 26
  • 1
  • Hi @David, I've been having a problem with this code recently I hope you can help with. In my dataframe, there are 35 subjects, 10 of which have `Score2 = 100`. For some reason, this code outputs that they are have a rank of 6 for Score2, when they should all have a rank of 1. Do you know why this is? – code_machine Mar 22 '23 at 11:18
  • 1
    The rank function in pandas has a parameter called method which is set to "average" by default. This means that if there are ties in the data, the ranks are assigned by averaging the positions that the values would occupy. So in your case, since there are 10 values that are equal to 100 in the 'Score2' column, the rank of the first value would be (1+2+3+4+5+6+7+8+9+10)/10 = 5.5, and the rank of the subsequent values would also be 5.5 (->6). Try to set the 'method' parameter to "min", which will assign the minimum rank to each group of tied values – David Mar 23 '23 at 12:43