2 versions of the solution, slow and fast for a len(df) = 3300000
Slow:
%%time
d = 1
for i,v in df.iterrows():
if (v.flag == 1) and (d<5) :
df.at[i,'flag1'] = 0
d+=1
elif (v.flag == 1):
df.at[i,'flag1'] = 1
d=1
else:
df.at[i,'flag1'] = 0
d=1
df['flag2']=df['flag1'].astype(int)
Wall time: 4min 27s
Fast:
%%time
from math import floor
d = 1
df['flag1'] = (
[(0,(d:=1))[0] if df.at[i,'flag']==0
else (0, (d := d+1))[0] if (d%5)!=0
else (1, (d :=1 ))[0]
for i in range(len(df))
] )
Wall time: 1min 1s
Ignore the "new" column.
|
flag |
flag1 |
flag2 |
new |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
2 |
1 |
0 |
0 |
0 |
3 |
1 |
0 |
0 |
0 |
4 |
1 |
0 |
0 |
0 |
5 |
1 |
0 |
0 |
0 |
6 |
1 |
1 |
1 |
1 |
7 |
1 |
0 |
0 |
0 |
8 |
1 |
0 |
0 |
0 |
9 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
11 |
0 |
0 |
0 |
0 |
12 |
1 |
0 |
0 |
0 |
13 |
1 |
0 |
0 |
0 |
14 |
1 |
0 |
0 |
0 |
15 |
1 |
0 |
0 |
0 |
16 |
1 |
1 |
1 |
1 |
17 |
1 |
0 |
0 |
0 |
18 |
1 |
0 |
0 |
0 |
19 |
1 |
0 |
0 |
0 |
20 |
1 |
0 |
0 |
0 |
21 |
1 |
1 |
1 |
0 |
22 |
1 |
0 |
0 |
0 |
23 |
1 |
0 |
0 |
0 |
24 |
1 |
0 |
0 |
0 |
25 |
0 |
0 |
0 |
0 |
26 |
0 |
0 |
0 |
0 |
27 |
1 |
0 |
0 |
0 |
28 |
0 |
0 |
0 |
0 |
29 |
1 |
0 |
0 |
0 |
30 |
1 |
0 |
0 |
0 |
31 |
1 |
0 |
0 |
0 |
32 |
1 |
0 |
0 |
0 |
33 |
0 |
0 |
0 |
0 |
34 |
0 |
0 |
0 |
0 |
35 |
1 |
0 |
0 |
0 |
36 |
1 |
0 |
0 |
0 |
37 |
1 |
0 |
0 |
0 |
38 |
1 |
0 |
0 |
0 |
39 |
1 |
1 |
1 |
1 |
40 |
1 |
0 |
0 |
0 |
41 |
1 |
0 |
0 |
0 |
42 |
0 |
0 |
0 |
0 |
43 |
0 |
0 |
0 |
0 |
44 |
0 |
0 |
0 |
0 |
45 |
1 |
0 |
0 |
0 |
46 |
1 |
0 |
0 |
0 |
47 |
1 |
0 |
0 |
0 |
48 |
1 |
0 |
0 |
0 |
49 |
1 |
1 |
1 |
1 |
For testing purpose, this is how I generated the data:
A = [0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1]
A = A * 100000
df=pd.DataFrame({'flag':A})