0

I have a Pandas dataframe below. I need to create a RANKING column where the ranking is based on SUM of SOLD_NO of STAFF_ID at UNIT level. This means that for each UNIT, the STAFF_ID is ranked in order of SUM of SOLD_NO in descending order.

i tried this code, but the it's not correct, mainly not all unit have a ranking 1.

df['RANKING'] = df.groupby(['UNIT'])['SOLD_NO'].transform(np.sum).rank(ascending=False, method='dense')

Is there anything i need to change?

enter image description here

Benny Frog
  • 11
  • 5

1 Answers1

1
import pandas as pd

# Your dataframe
data = {
    'UNIT': ['AA', 'AA', 'AA', 'AA','AA', 'BB', 'BB', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC'],
    'STAFF_ID': [111, 111, 222, 222, 222, 333, 333, 444, 444, 444, 444, 555, 555, 555, 555, 555, 666],
    'SOLD_NO': [2, 5, 1, 0, 10, 1, 1, 10, 4, 2, 6, 1, 1, 4, 1, 1, 8]
}

df = pd.DataFrame(data)

# Calculate the sum of SOLD_NO for each STAFF_ID in each UNIT
df_sum = df.groupby(['UNIT', 'STAFF_ID'])['SOLD_NO'].sum().reset_index()

# Calculate the ranking within each UNIT based on the sum of SOLD_NO
df_sum['RANKING'] = df_sum.groupby(['UNIT'])['SOLD_NO'].rank(ascending=False, method='dense')

# Merge the ranking back to the original dataframe
df = df.merge(df_sum[['UNIT', 'STAFF_ID', 'RANKING']], on=['UNIT', 'STAFF_ID'])

print(df)

Result:

   UNIT  STAFF_ID  SOLD_NO  RANKING
0    AA       111        2      2.0
1    AA       111        5      2.0
2    AA       222        1      1.0
3    AA       222        0      1.0
4    AA       222       10      1.0
5    BB       333        1      1.0
6    BB       333        1      1.0
7    CC       444       10      1.0
8    CC       444        4      1.0
9    CC       444        2      1.0
10   CC       444        6      1.0
11   CC       555        1      2.0
12   CC       555        1      2.0
13   CC       555        4      2.0
14   CC       555        1      2.0
15   CC       555        1      2.0
16   CC       666        8      2.0
Taras Drapalyuk
  • 473
  • 3
  • 6