5

I have a DataFrame as following.

df = pd.DataFrame({'col1': ['a','b','c','c','d','e','a','h','i','a'],'col2':['3:00','3:00','4:00','4:00','3:00','5:00','5:00','3:00','3:00','2:00']})

df
Out[83]: 
  col1  col2
0    a  3:00
1    b  3:00
2    c  4:00
3    c  4:00
4    d  3:00
5    e  5:00
6    a  5:00
7    h  3:00
8    i  3:00
9    a  2:00    

What I'd like to do is groupby 'col1' and assign a unique ID to different values in col2 as following:

col1  col2  ID
 a    2:00   0
 a    3:00   1
 a    5:00   2
 b    3:00   0
 c    4:00   0
 c    4:00   0
 ... 

I tried to use pd.Categorical but can't quite get to where I wanted to be.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
user4279562
  • 669
  • 12
  • 25

1 Answers1

14

we can use pd.factorize() method:

In [170]: df['ID'] = df.groupby('col1')['col2'].transform(lambda x: pd.factorize(x)[0])

In [171]: df
Out[171]:
  col1  col2  ID
0    a  3:00   0
1    b  3:00   0
2    c  4:00   0
3    c  4:00   0
4    d  3:00   0
5    e  5:00   0
6    a  5:00   1
7    h  3:00   0
8    i  3:00   0
9    a  2:00   2
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419