1

Hi i am basically trying to rank a column in a dataframe into ranking position.

it looks something like this i am trying to create something like this. For person with same number of fruits sold to have the same ranking So that when i sort them by rankings it does not have any decimals. Can anyone advice me?

person | number of fruits sold | ranking
 A     |          5            |    2
 B     |          6            |    1
 C     |          2            |    4
 D     |          5            |    2
 E     |          3            |    3
jpp
  • 159,742
  • 34
  • 281
  • 339
cwerwf
  • 13
  • 5
  • Possible duplicate of [Python pandas rank/sort based on another column that differs for each input](https://stackoverflow.com/questions/45763829/python-pandas-rank-sort-based-on-another-column-that-differs-for-each-input) – Rahul Agarwal Sep 10 '18 at 10:07
  • @RahulAgarwal, I don't think that's a good dup target. There's no `GroupBy` involved here. – jpp Sep 10 '18 at 10:12
  • Related: [Pandas: convert categories to numbers](https://stackoverflow.com/questions/38088652/pandas-convert-categories-to-numbers) – jpp Sep 10 '18 at 10:25

2 Answers2

1

You can use pd.factorize. A few tricks here: take care to negate your series, specify sort=True, add 1 for your desired result.

df['ranking'] = pd.factorize(-df['number of fruits sold'], sort=True)[0] + 1

Result:

    person  number of fruits sold  ranking
0   A                           5        2
1   B                           6        1
2   C                           2        4
3   D                           5        2
4   E                           3        3
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Hi i got the answer but can I get an explanation on how this pd.factorize works? – cwerwf Sep 10 '18 at 10:50
  • @cwerwf, The best explanation I can find is in the [docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html): `Encode the object as an enumerated type or categorical variable. This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values.` – jpp Sep 10 '18 at 10:51
1

Use Series.rank:

df['ranking'] = df['number of fruits sold'].rank(method='dense', ascending=False).astype(int)
print (df)
  person  number of fruits sold  ranking
0      A                      5        2
1      B                      6        1
2      C                      2        4
3      D                      5        2
4      E                      3        3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Can you comment on why `int` conversion is required? The docs say that ranks are 1 to *n*. So I believe `rank` should only ever output integers. For dense specifically, `rank always increases by 1 between groups`. – jpp Sep 10 '18 at 10:19
  • @jpp - Hmmm, no idea why output is float. – jezrael Sep 10 '18 at 10:44