How to make flag column value 1 when 5 consective rows in other column are 1

Question

I have one datafrmae which includes two columns flag and flag1, i want to check if flag column values greater than 1 for 5 times or greater than 5 times continuous flag1 value should change to 1

here is example

df=pd.DataFrame({'flag':[0,0,1,1,1,1,1,1,1,0,0,0],'flag1':[0,0,0,0,0,0,1,0,0,0,0,0]})

What happens if there is a string of 20 '1' ? Woud it be a) 00001000000000000000 or b) 00001000010000100001 ? — EBDS, Oct 22 '21 at 04:28

score 1 · Accepted Answer · answered Oct 21 '21 at 06:20

Idea is create consecutive counts and then test 5 if equal:

a = df['flag'].eq(1)
#https://stackoverflow.com/a/52718619/2901002
b = a.cumsum()
df['new'] = b.sub(b.mask(a).ffill().fillna(0)).eq(5).astype(int)


print (df)
    flag  flag1  new
0      0      0    0
1      0      0    0
2      1      0    0
3      1      0    0
4      1      0    0
5      1      0    0
6      1      1    1
7      1      0    0
8      1      0    0
9      0      0    0
10     0      0    0
11     0      0    0

Detail:

print (b.sub(b.mask(a).ffill().fillna(0)))
0     0.0
1     0.0
2     1.0
3     2.0
4     3.0
5     4.0
6     5.0
7     6.0
8     7.0
9     0.0
10    0.0
11    0.0
Name: flag, dtype: float64

score 1 · Answer 2 · answered Oct 21 '21 at 06:29

setup

import pandas as pd
df=pd.DataFrame({'flag':[0,0,1,1,1,1,1,1,1,0,0,0],'flag1':[0,0,0,0,0,0,1,0,0,0,0,0]})

solution

rolling_sum = df["flag"].rolling(5).sum()
df["check"] = ((rolling_sum == 5) & (rolling_sum.diff() == 1)).astype(int)

    flag  flag1  check
0      0      0      0
1      0      0      0
2      1      0      0
3      1      0      0
4      1      0      0
5      1      0      0
6      1      1      1
7      1      0      0
8      1      0      0
9      0      0      0
10     0      0      0
11     0      0      0

EBDS · Answer 3 · 2021-10-22T04:55:58.627

2 versions of the solution, slow and fast for a len(df) = 3300000

Slow:

%%time
d = 1
for i,v in df.iterrows():
    if (v.flag == 1) and (d<5) :
        df.at[i,'flag1'] = 0
        d+=1
    elif (v.flag == 1):
        df.at[i,'flag1'] = 1
        d=1
    else:
        df.at[i,'flag1'] = 0
        d=1
df['flag2']=df['flag1'].astype(int)

Wall time: 4min 27s

Fast:

%%time
from math import floor
d = 1
df['flag1'] = (
        [(0,(d:=1))[0] if df.at[i,'flag']==0 
    else (0, (d := d+1))[0] if (d%5)!=0
    else (1, (d :=1 ))[0]
    for i in range(len(df))
    ] )

Wall time: 1min 1s

Ignore the "new" column.

	flag	flag1	flag2	new
0	0	0	0	0
1	0	0	0	0
2	1	0	0	0
3	1	0	0	0
4	1	0	0	0
5	1	0	0	0
6	1	1	1	1
7	1	0	0	0
8	1	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	1	0	0	0
13	1	0	0	0
14	1	0	0	0
15	1	0	0	0
16	1	1	1	1
17	1	0	0	0
18	1	0	0	0
19	1	0	0	0
20	1	0	0	0
21	1	1	1	0
22	1	0	0	0
23	1	0	0	0
24	1	0	0	0
25	0	0	0	0
26	0	0	0	0
27	1	0	0	0
28	0	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	0	0	0
32	1	0	0	0
33	0	0	0	0
34	0	0	0	0
35	1	0	0	0
36	1	0	0	0
37	1	0	0	0
38	1	0	0	0
39	1	1	1	1
40	1	0	0	0
41	1	0	0	0
42	0	0	0	0
43	0	0	0	0
44	0	0	0	0
45	1	0	0	0
46	1	0	0	0
47	1	0	0	0
48	1	0	0	0
49	1	1	1	1

For testing purpose, this is how I generated the data:

A = [0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1]
A = A * 100000
df=pd.DataFrame({'flag':A})

https://stackoverflow.com/questions/24870953/does-pandas-iterrows-have-performance-issues/24871316#24871316 — jezrael, Oct 21 '21 at 06:20
@jezrael Thanks very much for the link. It is very useful for me. I tried different algorithms or pandas manipulation. And I also tried your method. But I found that the output are different. eg Index=21 when there is a long line of 1s. I've written to clarify with OP on the requirements on this. But for academic purpose, the different I find is the need to keep track of the "state change". I couldn't find a way to "remember" the last change when I use pandas manipulation. Once again, thanks very much. — EBDS, Oct 22 '21 at 04:55

	flag	flag1	flag2	new
0	0	0	0	0
1	0	0	0	0
2	1	0	0	0
3	1	0	0	0
4	1	0	0	0
5	1	0	0	0
6	1	1	1	1
7	1	0	0	0
8	1	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	1	0	0	0
13	1	0	0	0
14	1	0	0	0
15	1	0	0	0
16	1	1	1	1
17	1	0	0	0
18	1	0	0	0
19	1	0	0	0
20	1	0	0	0
21	1	1	1	0
22	1	0	0	0
23	1	0	0	0
24	1	0	0	0
25	0	0	0	0
26	0	0	0	0
27	1	0	0	0
28	0	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	0	0	0
32	1	0	0	0
33	0	0	0	0
34	0	0	0	0
35	1	0	0	0
36	1	0	0	0
37	1	0	0	0
38	1	0	0	0
39	1	1	1	1
40	1	0	0	0
41	1	0	0	0
42	0	0	0	0
43	0	0	0	0
44	0	0	0	0
45	1	0	0	0
46	1	0	0	0
47	1	0	0	0
48	1	0	0	0
49	1	1	1	1

	flag	flag1	flag2	new
0	0	0	0	0
1	0	0	0	0
2	1	0	0	0
3	1	0	0	0
4	1	0	0	0
5	1	0	0	0
6	1	1	1	1
7	1	0	0	0
8	1	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	1	0	0	0
13	1	0	0	0
14	1	0	0	0
15	1	0	0	0
16	1	1	1	1
17	1	0	0	0
18	1	0	0	0
19	1	0	0	0
20	1	0	0	0
21	1	1	1	0
22	1	0	0	0
23	1	0	0	0
24	1	0	0	0
25	0	0	0	0
26	0	0	0	0
27	1	0	0	0
28	0	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	0	0	0
32	1	0	0	0
33	0	0	0	0
34	0	0	0	0
35	1	0	0	0
36	1	0	0	0
37	1	0	0	0
38	1	0	0	0
39	1	1	1	1
40	1	0	0	0
41	1	0	0	0
42	0	0	0	0
43	0	0	0	0
44	0	0	0	0
45	1	0	0	0
46	1	0	0	0
47	1	0	0	0
48	1	0	0	0
49	1	1	1	1

How to make flag column value 1 when 5 consective rows in other column are 1

3 Answers3

	flag	flag1	flag2	new
0	0	0	0	0
1	0	0	0	0
2	1	0	0	0
3	1	0	0	0
4	1	0	0	0
5	1	0	0	0
6	1	1	1	1
7	1	0	0	0
8	1	0	0	0
9	0	0	0	0
10	0	0	0	0
11	0	0	0	0
12	1	0	0	0
13	1	0	0	0
14	1	0	0	0
15	1	0	0	0
16	1	1	1	1
17	1	0	0	0
18	1	0	0	0
19	1	0	0	0
20	1	0	0	0
21	1	1	1	0
22	1	0	0	0
23	1	0	0	0
24	1	0	0	0
25	0	0	0	0
26	0	0	0	0
27	1	0	0	0
28	0	0	0	0
29	1	0	0	0
30	1	0	0	0
31	1	0	0	0
32	1	0	0	0
33	0	0	0	0
34	0	0	0	0
35	1	0	0	0
36	1	0	0	0
37	1	0	0	0
38	1	0	0	0
39	1	1	1	1
40	1	0	0	0
41	1	0	0	0
42	0	0	0	0
43	0	0	0	0
44	0	0	0	0
45	1	0	0	0
46	1	0	0	0
47	1	0	0	0
48	1	0	0	0
49	1	1	1	1