Add counting column for each categorical value in a particular column

Question

I´m working an oil wells database and I need to get a column counting months since first production.

Note that I need for every natural month the counting value (for instance to compare every well production level in its second month in production) so the solutions with groupby/agg are not useful in this case.

I ´ve doubts on how to achieve the desired month_count column

id	year	month	month_count
A	2022	6	1
A	2022	7	2
A	2022	8	3
A	2022	9	4
B	2022	8	1
B	2022	9	2
C	2022	5	1
C	2022	6	2
C	2022	7	3
C	2022	8	4
C	2022	9	5

Here´s the code, including the desired output (month_count) column

import pandas as pd

data = {
    'id' : ['A','A','A','A','B','B','C','C','C','C','C'],
    'year' : [2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],
    'month' : [6,7,8,9,8,9,5,6,7,8,9],
    'month_count' : [1,2,3,4,1,2,1,2,3,4,5],
}
  
df = pd.DataFrame(data)

Which solution would be the best in your opinion?

score 0 · Answer 1 · answered Oct 03 '22 at 06:51

0

I m finding it hard to find an appropriate duplicate question for this. I m pretty sure it has been answered before. But here is the answer for this - you need to use cumcount with groupby.

df['month_count'] = df.groupby(['id', 'year']).cumcount() + 1

answered Oct 03 '22 at 06:51

SomeDude

13,876
5
21
44

this answer will not aggregate across different years, if that is a requirement – bn_ln Oct 03 '22 at 06:57
@bn_ln I don't know what you meant. Do you mean cumcount should spread across years? In that case just remove 'year' from groupby? – SomeDude Oct 03 '22 at 07:03
agreed, the groupby year doesn't seem necessary – bn_ln Oct 03 '22 at 07:27

Add counting column for each categorical value in a particular column

1 Answers1