0

I am trying to generate a unique index column in my dataset.

I have a column in my dataset as follows: 665678, 665678, 665678, 665682, 665682, 665682, 665690, 665690

And I would like to generate a separately indexed column looking like this: 1, 1, 1, 2, 2, 2, 3, 3

I came across the post How to index columns uniquely?? that describes exactly what I am trying to do. But since the solutions are described for R, I wanted to know how can I implement the same in Python using Pandas.

Thanks

  • 1
    Use `df.columns = pd.factorize(df.columns)[0] + 1` – jezrael Jan 03 '19 at 15:01
  • 2
    Or use `df.col1.astype('category').cat.codes + 1` – Scott Boston Jan 03 '19 at 15:01
  • Thank you guys. Both the solutions work and having read the [Pandas DENSE RANK](https://stackoverflow.com/questions/39357882/pandas-dense-rank), `factorize` seems to be the right option considering that my data is sorted –  Jan 03 '19 at 15:18

1 Answers1

1

Use -

df.groupby('col').ngroup()+1

Output

0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
dtype: int64
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42