Efficiently labelling a column that contains repeated elements

Question

I have a dataframe with a column consisting of author names, where sometimes the name of an author repeats. My problem is: I want to assign a unique number to each author name in a corresponding parallel column (for simplicity, assume that this numbering follows the progression of whole numbers, starting with 0, then 1, 2, 3, and so on).

I can do this using nested FOR loops, but with 57000 records consisting of 500 odd unique authors, it is taking way too long. Is there a quicker way to do this?

For example,

Original DataFrame contains:

**Author**
Name 1 
Name 2
Name 1 
Name 3

I want another column added next to it, such that:

**Author** **AuthorID*
Name 1          1
Name 2          2
Name 1          1
Name 3          3

It sounds like [How to add sequential counter column on groups using Pandas groupby](https://stackoverflow.com/q/23435270/15497888) `df['id col'] = df.groupby('Author Name').cumcount()` — Henry Ecker, Oct 29 '21 at 05:24
Actually, it is a little different from cumcount(). I don't want to the cumulative count of the number of occurrences of each name. Instead, I want to assign a unique number to each unique name. If the name repeats in the column, the number should repeat (in another column next to it). — Joebevo, Oct 29 '21 at 08:31
Then either `df['id col'] = df.groupby('Author Name', sort=False).ngroup()` or `df['id col'] = df['Author Name'].factorize()[0]` — Henry Ecker, Oct 29 '21 at 12:06

Efficiently labelling a column that contains repeated elements

0 Answers0