2

I'm trying to add a slope calculation on individual subsets of two fields in a dataframe and have that value of slope applied to all rows in each subset. (I've used the "slope" function in excel previously, although I'm not married to the exact algo. The "desired_output" field is what I'm expecting as the output. The subsets are distinguished by the "strike_order" column, subsets starting at 1 and not having a specific highest value.

"IV" is the y value "Strike" is the x value

Any help would be appreciated as I don't even know where to begin with this....

import pandas
df = pandas.DataFrame([[1200,1,.4,0.005],[1210,2,.35,0.005],[1220,3,.3,0.005],
[1230,4,.25,0.005],[1200,1,.4,0.003],[1210,2,.37,.003]],columns=
["strike","strike_order","IV","desired_output"])
df

    strike  strike_order    IV  desired_output
0   1200        1         0.40    0.005
1   1210        2         0.35    0.005
2   1220        3         0.30    0.005
3   1230        4         0.25    0.005
4   1200        1         0.40    0.003
5   1210        2         0.37    0.003

Let me know if this isn't a well posed question and I'll try to make it better.

Benson Burns
  • 31
  • 1
  • 6

3 Answers3

1

You can use numpy's least square We can rewrite the line equationy=mx+c as y = Ap, where A = [[x 1]] and p = [[m], [c]]. Then use lstsq to solve for p, so we need to create A by adding a column of ones to df

import numpy as np
df['ones']=1
A = df[['strike','ones']]
y = df['IV']
m, c = np.linalg.lstsq(A,y)[0]

Alternatively you can use scikit learn's linear_model Regression model

you can verify the result by plotting the data as scatter plot and the line equation as plot

import matplotlib.pyplot as plt
plt.scatter(df['strike'],df['IV'],color='r',marker='d')
x = df['strike']
#plug x in the equation y=mx+c
y_line = c + m * x
plt.plot(x,y)
plt.xlabel('Strike')
plt.ylabel('IV')
plt.show()

the resulting plot is shown below enter image description here

sgDysregulation
  • 4,309
  • 2
  • 23
  • 31
0

Try this.

First create a subset column by iterating over the dataframe, using the strike_order value transitioning to 1 as the boundary between subsets

#create subset column
subset_counter = 0
for index, row in df.iterrows():
    if row["strike_order"] == 1:
      df.loc[index,'subset'] = subset_counter
      subset_counter += 1
    else:
      df.loc[index,'subset'] = df.loc[index-1,'subset']

df['subset'] = df['subset'].astype(int)

Then run a linear regression over each subset using groupby

# run linear regression on subsets of the dataframe using groupby
from sklearn import linear_model
model = linear_model.LinearRegression()
for (group, df_gp) in df.groupby('subset'):
    X=df_gp[['strike']]
    y=df_gp.IV
    model.fit(X,y)
    df.loc[df.subset == df_gp.iloc[0].subset, 'slope'] = model.coef_

df

   strike  strike_order    IV  desired_output  subset  slope
0    1200             1  0.40           0.005       0 -0.005
1    1210             2  0.35           0.005       0 -0.005
2    1220             3  0.30           0.005       0 -0.005
3    1230             4  0.25           0.005       0 -0.005
4    1200             1  0.40           0.003       1 -0.003
5    1210             2  0.37           0.003       1 -0.003
0

@ Scott This worked except it went subset value 0, 1 and all subsequent subset values were 2. I added an extra conditional at the beginning and a very clumsy seed "seed" value to stop it looking for row -1.

    import scipy
    seed=df.loc[0,"date_exp"]
    #seed ="08/11/200015/06/2001C"
    #print(seed)
    subset_counter = 0
    for index, row in df.iterrows():
        #if index['strike_order']==0:
        if row['date_exp'] ==seed:
         df.loc[index,'subset']=0

        elif row["strike_order"] == 1:
        df.loc[index,'subset'] = subset_counter
         subset_counter = 1 + df.loc[index-1,'subset']
        else:
          df.loc[index,'subset'] = df.loc[index-1,'subset']

    df['subset'] = df['subset'].astype(int)

This now does exactly what I want although I think using the seed value is clunky, would have preferred to use if row == 0 etc. But it's friday and this works.

Cheers

Benson Burns
  • 31
  • 1
  • 6