1

I need to create a new "identifier column" with unique values for each combination of values of two columns. For example, the same "identifier" should be used when ID and phase are the same (e.g. r1 and ph1 [but a new, unique value should be added to the column when r1 and ph2])

df
ID   phase   side   values
r1   ph1     l      12
r1   ph1     r      34
r1   ph2     l      93
s4   ph3     l      21
s3   ph2     l      88
s3   ph2     r      54
...

I would need a new column (idx) like so:

new_df
ID   phase   side   values    idx
r1   ph1     l      12        1
r1   ph1     r      34        1
r1   ph2     l      93        2
s4   ph3     l      21        3
s3   ph2     l      88        4
s3   ph2     r      54        4
...

I've tried applying code from this question but could no achieve a way to increment the values in idx.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Oiko
  • 139
  • 9

1 Answers1

4

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df:

   ID phase side  values  idx
0  r1   ph1    l      12    1
1  r1   ph1    r      34    1
2  r1   ph2    l      93    2
3  s4   ph3    l      21    3
4  s3   ph2    l      88    4
5  s3   ph2    r      54    4
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57