2

I have a set of EURUSD data and looking at arbitrage opportunities. The data is formatted as shown in photo.

enter image description here

mispricing_1=yes when buy_b_sell_A>0 and mispricing_2=yes when buy_A_sell_B>0

In the photo there is no datapoint where exploitable=yes however when the buy_b_sell_A>6 or when buy_A_sell_B>6, then we get exploitable=yes

I am looking to calculate the average length of time an exploitable arbitrage opportunity is present, shown by exploitable=yes

How can I calculate the length of time that there are consecutive exploitable=yes so that I can plot a distribution and then also calculate the average?

  • 1
    Could you show what you tried? – Amessihel Jul 18 '20 at 09:35
  • Hi, So far have not tried anything, coding is not my forte, sorry – InvestingBetter Jul 18 '20 at 09:35
  • You need to be a bit more clear. Like ```exploitable``` is ```exploitable_1``` or ```exploitable_2```? Likewise for ```mispricing```. Imagine the reader has to understand your requirement by reading what you have given. So, make sure you do that. – a_r Jul 18 '20 at 09:44
  • 4
    I would recommend reading https://stackoverflow.com/help/on-topic – Sinkingpoint Jul 18 '20 at 09:46
  • apologies, it would be for both. So I would like to calculate the average time when exploitable_1==yes and when exploitable_2==yes – InvestingBetter Jul 18 '20 at 09:46
  • have you tried something like this? `df[df['expliotable_1'].str.contains('yes')].shape` – tandem Jul 18 '20 at 10:18
  • @tandem will try now and revert back, thank you – InvestingBetter Jul 18 '20 at 10:21
  • 1
    You are unlikely to receive good help because your problem is too hard to play with. Post actual data we can copy-paste instead of a photo. Best of all would be a link to any kind of data file, e.g. a csv. As @tandem mentioned, you should look at the `pandas` library for python which does analysis of tabular data like this very efficiently and easily. – Peaceful James Jul 18 '20 at 10:24
  • Please include a minimum reproducible example and expected output. This will likely be closed. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jul 18 '20 at 10:24
  • @PeacefulJames I am going to post a csv file with the data in and hopefully this well help. It is really quite a large data file so takes some time to load – InvestingBetter Jul 18 '20 at 12:22

2 Answers2

2
df=pd.DataFrame(data={'ts':list(range(1,14)),
                  'mp':[0,0,1,1,1,0,0,1,1,0,0,1,0]}) # your data
df.loc[df.mp.diff(1)==1, 'ts1'] = df.ts  # TS1
df.loc[df.mp.diff(1)==-1, 'ts2'] = df.ts  # TS2
df=df[~(df.ts1.isna())|~(df.ts2.isna())] # keep only rows with changes
df.loc[~df.ts2.isna(), 'delta'] = df.ts2 - df.ts1.shift(1)  # TS2-TS1
print (df)
0

If you import this as a panda frame, which lets call it df, you can do df.groupby[‘exploitable’].mean You could do .histogram or something for distribution.

Jim
  • 36
  • 2