1

Suppose I have the following dataframe

import pandas as pd  
  
df = pd.DataFrame({'a': [1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4],
                   'b': [3,4,3,7,5,9,4,2,5,6,7,8,4,2,4,5,8,0]})
    a   b
0   1   3
1   1   4
2   1   3
3   2   7
4   2   5
5   2   9
6   2   4
7   2   2
8   3   5
9   3   6
10  3   7
11  3   8
12  4   4
13  4   2
14  4   4
15  4   5
16  4   8
17  4   0

And I would like to make a new column c with values 1 to n where n depends on the value of column a as follow:

    a   b   c
0   1   3   1
1   1   4   2
2   1   3   3
3   2   7   1
4   2   5   2
5   2   9   3
6   2   4   4
7   2   2   5
8   3   5   1
9   3   6   2
10  3   7   3
11  3   8   4
12  4   4   1
13  4   2   2
14  4   4   3
15  4   5   4
16  4   8   5
17  4   0   6

While I can write it using a for loop, my data frame is huge and it's computationally costly, is there any efficient to generate such column? Thanks.

Ishigami
  • 181
  • 7

1 Answers1

2

Use groupby_cumcount:

df['c'] = df.groupby('a').cumcount().add(1)
print(df)

# Output
    a  b  c
0   1  3  1
1   1  4  2
2   1  3  3
3   2  7  1
4   2  5  2
5   2  9  3
6   2  4  4
7   2  2  5
8   3  5  1
9   3  6  2
10  3  7  3
11  3  8  4
12  4  4  1
13  4  2  2
14  4  4  3
15  4  5  4
16  4  8  5
17  4  0  6
Corralien
  • 109,409
  • 8
  • 28
  • 52