How to create a column that keeps track of the number of times the value in another column has occurred until that row in pandas

Question

For instance, suppose this is the dataframe I have:

row    object_id
1      1024
2      1024
3      1024
4      1032
5      1032
6      1048
...    ...

I want to create a column occurence_num as follows:

row    object_id     occurence_num
1      1024          1         # starts count
2      1024          2 
3      1024          3
4      1032          1         # restarts count since object_id has changed
5      1032          2 
6      1048          1         # restarts count since object_id has changed
...    ...           ...

A method of doing it in SQL would also be helpful, but I want to be able to do it using pandas.

NOTE: I've found a way to do it in excel here.

@Corralien for my application it should be 4, but a solution for the situation where it would be 1 would be interesting as well — hk1510, Jul 19 '21 at 23:18

score 1 · Accepted Answer · answered Jul 19 '21 at 23:09

You can use cum.count():

df['occurence_num']=df.groupby('object_id').cumcount()+1

>>>print(df)

   row  object_id  occurence_num
    1       1024              1
    2       1024              2
    3       1024              3
    4       1032              1
    5       1032              2
    6       1048              1

score 1 · Answer 2 · answered Jul 19 '21 at 23:22

Input data:

   row  object_id
0    1       1024
1    2       1024
2    3       1024
3    4       1032
4    5       1032
5    6       1048
6    7       1024  # Case 1 -> 4, Case 2 -> 1

Case 1: occurrence_id of row 7 is 4

>>> df.groupby('object_id').cumcount()+1
0    1
1    2
2    3
3    1
4    2
5    1
6    4
dtype: int64

Case 2: occurrence_id of row 7 is 1

>>> df.sub(df.shift()).ne(0).cumsum().groupby('object_id').cumcount() + 1
0    1
1    2
2    3
3    1
4    2
5    1
6    1
dtype: int64

How to create a column that keeps track of the number of times the value in another column has occurred until that row in pandas

2 Answers2