3

I have a pandas dataframe as below

Dominant_Topic  word    appearance
Topic 0         aaaawww         50
Topic 0         aacn            100
Topic 0         aaren           20
Topic 0         aarongoodwin    200
Topic 1         aaronjfentress  10
Topic 1         aaronrodger     20
Topic 1         aasmiitkap      30
Topic 2         aavqbketmh      10
Topic 2         ab              10
Topic 2         abandon         1

I want to get a dense rank for each partition, the partition column being the column named Dominant_Topic. The ranking should be based in descending manner on how many times the words in each partition appear. So the output would look like -

Dominant_Topic  word    appearance    dense_rank
Topic 0         aaaawww         50     3
Topic 0         aacn            100    2
Topic 0         aaren           20     4
Topic 0         aarongoodwin    200    1
Topic 1         aaronjfentress  10     3
Topic 1         aaronrodger     20     2
Topic 1         aasmiitkap      30     1
Topic 2         aavqbketmh      10     1
Topic 2         ab              10     1
Topic 2         abandon         1      2

How do I achieve this in Pandas ?

The SQL equivalent looks something like this -

select *, dense_rank() over( partition by dominant_topic order by appearance desc)
from table
Regressor
  • 1,843
  • 4
  • 27
  • 67

2 Answers2

5

This is built-in with groupby:

df['dense_rank'] = (df.groupby('Dominant_Topic')['appearance']
                      .rank(method='dense', ascending=False)
                      .astype(int)
                   )

Output:

  Dominant_Topic            word  appearance  dense_rank
0        Topic 0         aaaawww          50           3
1        Topic 0            aacn         100           2
2        Topic 0           aaren          20           4
3        Topic 0    aarongoodwin         200           1
4        Topic 1  aaronjfentress          10           3
5        Topic 1     aaronrodger          20           2
6        Topic 1      aasmiitkap          30           1
7        Topic 2      aavqbketmh          10           1
8        Topic 2              ab          10           1
9        Topic 2         abandon           1           2
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

window functions in pandas require you to specify the groupby explicity, but it's very similair.

df['rank'] = df.groupby(['Dominant_Topic'])['appearance'].rank(method='dense',
                                                              ascending=False)

  Dominant_Topic            word  appearance  rank
0        Topic 0         aaaawww          50   3.0
1        Topic 0            aacn         100   2.0
2        Topic 0           aaren          20   4.0
3        Topic 0    aarongoodwin         200   1.0
4        Topic 1  aaronjfentress          10   3.0
5        Topic 1     aaronrodger          20   2.0
6        Topic 1      aasmiitkap          30   1.0
7        Topic 2      aavqbketmh          10   1.0
8        Topic 2              ab          10   1.0
9        Topic 2         abandon           1   2.0
Umar.H
  • 22,559
  • 7
  • 39
  • 74