2

I am trying to rank a column (col 1) in pandas. If there is a tie, I want to look at another column (col 2) just for those records and do a tie breaker. IF they are same even in that column, I want to just assign the ranks randomly but each row must have a unique rank.

Example:

col1  |  col 2 | Rank
 20   |  3     | 3
 22   |  2     | 2
 22   |  2.5   | 1
 3    |  1     | 4
 3    |  1     | 5
Fred
  • 1,462
  • 8
  • 15
Nivi
  • 1,067
  • 4
  • 15
  • 28

1 Answers1

6
df['Rank'] = df.sort_values(by=['col1', 'col2'], ascending=False) \
               .reset_index() \
               .sort_values('index') \
               .index + 1

This code goes through these steps:

  1. Sorts the data frame into a new temporary one
  2. Resets the index, keeping the old one as a new column called 'index'. Make sure you don't have any starting columns named 'index'. If you do have one, the new column is called level_0. If you have both index and level_0, the code will throw an exception.
  3. Sorts the temporary dataframe according to the old index (so it's in the original order again)
  4. Uses the new index (which is in the rank order we want) as the new 'Rank' column. Plus 1 because your question seems to want ranks that start at 1, not 0
Fred
  • 1,462
  • 8
  • 15
  • Does this assign a unique rank for each row? I have 60997 rows and when I do pd.unique on the Rank column, I get only 60147 unique values. Thanks for responding. – Nivi Mar 14 '18 at 14:34
  • The last answer was actually completely wrong. Try again :) – Fred Mar 14 '18 at 14:55
  • Hahaha wow but this works perfectly! Thank you so much. Your previous answer however worked for the example I posted in my question. – Nivi Mar 14 '18 at 14:57
  • Thanks for the detailed answer. Very helpful. – Nivi Mar 14 '18 at 15:21