-1

I´m working an oil wells database and I need to get a column counting months since first production.

Note that I need for every natural month the counting value (for instance to compare every well production level in its second month in production) so the solutions with groupby/agg are not useful in this case.

I ´ve doubts on how to achieve the desired month_count column

id year month month_count
A 2022 6 1
A 2022 7 2
A 2022 8 3
A 2022 9 4
B 2022 8 1
B 2022 9 2
C 2022 5 1
C 2022 6 2
C 2022 7 3
C 2022 8 4
C 2022 9 5

Here´s the code, including the desired output (month_count) column

import pandas as pd

data = {
    'id' : ['A','A','A','A','B','B','C','C','C','C','C'],
    'year' : [2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],
    'month' : [6,7,8,9,8,9,5,6,7,8,9],
    'month_count' : [1,2,3,4,1,2,1,2,3,4,5],
}
  
df = pd.DataFrame(data)

Which solution would be the best in your opinion?

1 Answers1

0

I m finding it hard to find an appropriate duplicate question for this. I m pretty sure it has been answered before. But here is the answer for this - you need to use cumcount with groupby.

df['month_count'] = df.groupby(['id', 'year']).cumcount() + 1
SomeDude
  • 13,876
  • 5
  • 21
  • 44