1

I'm interested in assigning values to duplicate rows in a Pandas dataframe as below. The dataframe is below:-

data_1 = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"]} 
df = pd.DataFrame(data_1)  

The output should be something related to the output of df2.

output = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"],"Number": [1, 1, 2, 1, 2,3]} 
df2 = pd.DataFrame(output)  

How can I autoincrement the "Number" on duplicated "ID"?

Hummer
  • 429
  • 1
  • 3
  • 16

2 Answers2

0

As per @Barmar's response, this seems to have answered it. More details are found here

df["rank"] = df.groupby("ID").rank("first", ascending=False)
Hummer
  • 429
  • 1
  • 3
  • 16
0

You can use groupby + cumcount (adding 1 as the counts are from 0):

df['Number'] = df.groupby('ID').cumcount().add(1)

Output:

    ID Name  Number
0  001  XX1       1
1  003  XX3       1
2  001  XX1       2
3  002  XX2       1
4  002  XX2       2
5  002  XX2       3
mozway
  • 194,879
  • 13
  • 39
  • 75