Create a new column with unique identifier for each group

Question

I need to create a new "identifier column" with unique values for each combination of values of two columns. For example, the same "identifier" should be used when ID and phase are the same (e.g. r1 and ph1 [but a new, unique value should be added to the column when r1 and ph2])

df
ID   phase   side   values
r1   ph1     l      12
r1   ph1     r      34
r1   ph2     l      93
s4   ph3     l      21
s3   ph2     l      88
s3   ph2     r      54
...

I would need a new column (idx) like so:

new_df
ID   phase   side   values    idx
r1   ph1     l      12        1
r1   ph1     r      34        1
r1   ph2     l      93        2
s4   ph3     l      21        3
s3   ph2     l      88        4
s3   ph2     r      54        4
...

I've tried applying code from this question but could no achieve a way to increment the values in idx.

score 4 · Accepted Answer · answered Jun 07 '21 at 14:44

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df:

   ID phase side  values  idx
0  r1   ph1    l      12    1
1  r1   ph1    r      34    1
2  r1   ph2    l      93    2
3  s4   ph3    l      21    3
4  s3   ph2    l      88    4
5  s3   ph2    r      54    4

Create a new column with unique identifier for each group

1 Answers1

Linked

Related