Assign a value to duplicates in Python

Question

I'm interested in assigning values to duplicate rows in a Pandas dataframe as below. The dataframe is below:-

data_1 = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"]} 
df = pd.DataFrame(data_1)

The output should be something related to the output of df2.

output = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"],"Number": [1, 1, 2, 1, 2,3]} 
df2 = pd.DataFrame(output)

How can I autoincrement the "Number" on duplicated "ID"?

This is rank within a group. There are pandas methods for doing this, google them. — Barmar, Oct 04 '21 at 22:10
Does https://stackoverflow.com/questions/66489613/pandas-group-by-and-rank-within-group-based-on-multiple-columns help? — Karl Knechtel, Oct 04 '21 at 22:15

score 0 · Answer 1 · answered Oct 04 '21 at 22:16

0

As per @Barmar's response, this seems to have answered it. More details are found here

df["rank"] = df.groupby("ID").rank("first", ascending=False)

answered Oct 04 '21 at 22:16

Hummer

429
1
3
16

score 0 · Accepted Answer · answered Oct 04 '21 at 22:18

You can use groupby + cumcount (adding 1 as the counts are from 0):

df['Number'] = df.groupby('ID').cumcount().add(1)

Output:

    ID Name  Number
0  001  XX1       1
1  003  XX3       1
2  001  XX1       2
3  002  XX2       1
4  002  XX2       2
5  002  XX2       3

Assign a value to duplicates in Python

2 Answers2