0

I have a dataframe with multiple rows. Some of the rows are duplicates, but I would like to take those duplicates into account along with the rest of the rows. I would like to assign a number to each row starting at 100 and when a duplicate is encountered it would look more like 100, 101, 101, 102, 103, 103, 103 and so on with the duplicates being the numbers that are listed more than once.

I have used a code I found on here thanks to @Cleb, However, it is not quite what I am looking for and I've been messing with it for awhile hoping I could achieve this to no avail. I'm still new to Python and I may not even be using the correct method.

The code:

d = {ni: indi for indi, ni enumerate(df, start=100)}
index = [d[ni] for ni in df]

The data:

data = { 'ID':['161464','146446','146446','368416','464344','464344','464344'],
'Name':['Jen','Zach','Zach','Rachel','Scott','Scott','Scott']}

The output as of now does count the duplicates, however, it starts the assigned number at the tail of the duplicate count. for ex. 100, 105, 105, 105, 105, 105 106,... and so on. Let me know if you have any questions and thank you for any assistance.

1 Answers1

0

You can use pandas.factorize to generate the unique id. As it starts from 0 you can add 100 to have an id from 100 on:

df['new_id'] = pd.factorize(df['ID'])[0]+100

output:

       ID    Name  new_id
0  161464     Jen     100
1  146446    Zach     101
2  146446    Zach     101
3  368416  Rachel     102
4  464344   Scott     103
5  464344   Scott     103
6  464344   Scott     103
mozway
  • 194,879
  • 13
  • 39
  • 75