Python: how to calculate average length of times when a variable=yes

Question

I have a set of EURUSD data and looking at arbitrage opportunities. The data is formatted as shown in photo.

mispricing_1=yes when buy_b_sell_A>0 and mispricing_2=yes when buy_A_sell_B>0

In the photo there is no datapoint where exploitable=yes however when the buy_b_sell_A>6 or when buy_A_sell_B>6, then we get exploitable=yes

I am looking to calculate the average length of time an exploitable arbitrage opportunity is present, shown by exploitable=yes

How can I calculate the length of time that there are consecutive exploitable=yes so that I can plot a distribution and then also calculate the average?

Hi, So far have not tried anything, coding is not my forte, sorry — InvestingBetter, Jul 18 '20 at 09:35
You need to be a bit more clear. Like ```exploitable``` is ```exploitable_1``` or ```exploitable_2```? Likewise for ```mispricing```. Imagine the reader has to understand your requirement by reading what you have given. So, make sure you do that. — a_r, Jul 18 '20 at 09:44
I would recommend reading https://stackoverflow.com/help/on-topic — Sinkingpoint, Jul 18 '20 at 09:46
apologies, it would be for both. So I would like to calculate the average time when exploitable_1==yes and when exploitable_2==yes — InvestingBetter, Jul 18 '20 at 09:46
have you tried something like this? `df[df['expliotable_1'].str.contains('yes')].shape` — tandem, Jul 18 '20 at 10:18
You are unlikely to receive good help because your problem is too hard to play with. Post actual data we can copy-paste instead of a photo. Best of all would be a link to any kind of data file, e.g. a csv. As @tandem mentioned, you should look at the `pandas` library for python which does analysis of tabular data like this very efficiently and easily. — Peaceful James, Jul 18 '20 at 10:24
Please include a minimum reproducible example and expected output. This will likely be closed. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Jul 18 '20 at 10:24
@PeacefulJames I am going to post a csv file with the data in and hopefully this well help. It is really quite a large data file so takes some time to load — InvestingBetter, Jul 18 '20 at 12:22

score 2 · Accepted Answer · answered Jul 23 '20 at 15:18

df=pd.DataFrame(data={'ts':list(range(1,14)),
                  'mp':[0,0,1,1,1,0,0,1,1,0,0,1,0]}) # your data
df.loc[df.mp.diff(1)==1, 'ts1'] = df.ts  # TS1
df.loc[df.mp.diff(1)==-1, 'ts2'] = df.ts  # TS2
df=df[~(df.ts1.isna())|~(df.ts2.isna())] # keep only rows with changes
df.loc[~df.ts2.isna(), 'delta'] = df.ts2 - df.ts1.shift(1)  # TS2-TS1
print (df)

score 0 · Answer 2 · answered Jul 18 '20 at 10:35

0

If you import this as a panda frame, which lets call it df, you can do df.groupby[‘exploitable’].mean You could do .histogram or something for distribution.

answered Jul 18 '20 at 10:35

Jim

36
2

Python: how to calculate average length of times when a variable=yes

2 Answers2