2

For instance, suppose this is the dataframe I have:

row    object_id
1      1024
2      1024
3      1024
4      1032
5      1032
6      1048
...    ...

I want to create a column occurence_num as follows:

row    object_id     occurence_num
1      1024          1         # starts count
2      1024          2 
3      1024          3
4      1032          1         # restarts count since object_id has changed
5      1032          2 
6      1048          1         # restarts count since object_id has changed
...    ...           ...

A method of doing it in SQL would also be helpful, but I want to be able to do it using pandas.

NOTE: I've found a way to do it in excel here.

hk1510
  • 35
  • 1
  • 4

2 Answers2

1

You can use cum.count():

df['occurence_num']=df.groupby('object_id').cumcount()+1

>>>print(df)

   row  object_id  occurence_num
    1       1024              1
    2       1024              2
    3       1024              3
    4       1032              1
    5       1032              2
    6       1048              1
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
1

Input data:

   row  object_id
0    1       1024
1    2       1024
2    3       1024
3    4       1032
4    5       1032
5    6       1048
6    7       1024  # Case 1 -> 4, Case 2 -> 1

Case 1: occurrence_id of row 7 is 4

>>> df.groupby('object_id').cumcount()+1
0    1
1    2
2    3
3    1
4    2
5    1
6    4
dtype: int64

Case 2: occurrence_id of row 7 is 1

>>> df.sub(df.shift()).ne(0).cumsum().groupby('object_id').cumcount() + 1
0    1
1    2
2    3
3    1
4    2
5    1
6    1
dtype: int64
Corralien
  • 109,409
  • 8
  • 28
  • 52