I got a transactional operation that produces a feed like below:
df = pd.DataFrame({'action':['transacted','transacted','transacted','transacted','undo','transacted','transacted','transacted','transacted','transacted','undo','undo','undo','transacted'],
'transaction_count':10,20,35,60,60,60,80,90,100,10,10,100,90,90]})
action | transaction_count | |
---|---|---|
0 | transacted | 10 |
1 | transacted | 20 |
2 | transacted | 35 |
3 | transacted | 60 |
4 | undo | 60 |
5 | transacted | 60 |
6 | transacted | 80 |
7 | transacted | 90 |
8 | transacted | 100 |
9 | transacted | 10 |
10 | undo | 10 |
11 | undo | 100 |
12 | undo | 90 |
13 | transacted | 90 |
The counts are in a pattern but not in a linear way. (10-20-35-60-80-90-100-10-20...)
undo states which transaction count is cancelled.
There can be multiple undo's for multiple cancellations.
# This is an initial apply, to set it up
df['is_undone']=df.apply(lambda x: 1 if x['action']=='undo' else 0, axis=1).shift(-1)
df=df.fillna(0) # For shift
df=df.loc[df['is_undone']==0]
df=df.fillna(0)
df=df.loc[df['action']!='undo']
df.reset_index(drop=True,inplace=True)
Unfortunately, it only works for single undo but not for multiple in a row. Apply does not let accessing neighbour row values and I can't think of any else solution. It should also need to calculate 300k rows, so, performance is also an issue.
Expected result is:
action | transaction_count | |
---|---|---|
0 | transacted | 10 |
1 | transacted | 20 |
2 | transacted | 35 |
3 | transacted | 60 |
4 | transacted | 80 |
5 | transacted | 90 |
Thanks in advance!