I have a dataframe with 3 columns, signal is either 0 or 1. I need to calculate number of cells after the signal was generated. also need to start the calculation with 0 if there was no signal from the start. sample data as follows -
Time symbol signal
09:15 abc 0
09:16 abc 0
09:17 abc 0
09:18 abc 1
09:19 abc 0
09:20 abc 0
09:21 abc 0
09:22 abc 1
09:23 abc 0
09:24 abc 1
09:25 abc 1
09:26 abc 0
09:15 xyz 0
09:16 xyz 0
09:17 xyz 1
09:18 xyz 0
09:19 xyz 0
09:20 xyz 0
09:21 xyz 1
09:22 xyz 0
09:23 xyz 0
09:24 xyz 0
09:25 xyz 0
09:26 xyz 0
Expected output -
Time symbol signal MinsSinceSignal
09:15 abc 0 0
09:16 abc 0 0
09:17 abc 0 0
09:18 abc 1 1
09:19 abc 0 2
09:20 abc 0 3
09:21 abc 0 4
09:22 abc 1 1
09:23 abc 0 2
09:24 abc 1 1
09:25 abc 1 1
09:26 abc 0 2
09:15 xyz 0 0
09:16 xyz 0 0
09:17 xyz 1 1
09:18 xyz 0 2
09:19 xyz 0 3
09:20 xyz 0 4
09:21 xyz 1 1
09:22 xyz 0 2
09:23 xyz 0 3
09:24 xyz 0 4
09:25 xyz 0 5
09:26 xyz 0 6
I have tried solution from Cumsum within group and reset on condition in pandas but its not working as expected.
df['G']=df.groupby('symbol').signal.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df['MinsSinceSignal']= df.groupby([df.symbol,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())
There are couple of issues with above code.
- It doesn't start with 0.
- when signal is 1. calculation starts from next row.
Please help!