Question:
Given a dataframe with data such as this:
>>> df
data
0 START
1 blah
2 blah
3 blah
4 blah
5 END
6 START
7 blah
8 blah
9 END
What is the most efficient way to assign a new column with a running number that gets incremented at every START
? This is my desired result:
>>> df
data number
0 START 1
1 blah 1
2 blah 1
3 blah 1
4 blah 1
5 END 1
6 START 2
7 blah 2
8 blah 2
9 END 2
What I've Done
This works fine, but is pretty slow (this will be applied to a much larger dataframe, and I'm sure there is a better way to do it:
counter = 0
df = df.assign(number = 0)
for i, row in df.iterrows():
if row['data'] == 'START':
counter += 1
df.loc[i, 'number'] = counter
To Reproduce example dataframe
import pandas as pd
data = ['blah'] * 10
data[0], data[6] = ['START'] * 2
data[5], data[-1] = ['END'] * 2
df = pd.DataFrame({'data':data})