I have a Pandas's code that calcul me the R2 of a linear regression over a window of size x. See my code :
def lr_r2_Sklearn(data):
data = np.array(data)
X = pd.Series(list(range(0,len(data),1))).values.reshape(-1,1)
Y = data.reshape(-1,1)
regressor = LinearRegression()
regressor.fit(X,Y)
return(regressor.score(X,Y))
r2_rolling = df[['value']].rolling(300).agg([lr_r2_Sklearn])
I am making a rolling of size 300 and calcul the r2 for each window. I wish to do the exact same thing but with pyspark and a spark dataframe. I know I must use the Window function, but it's a bit more difficult to understand than pandas, so I am lost ...
I have this but I don't know how to make it works.
w = Window().partitionBy(lit(1)).rowsBetween(-299,0)
data.select(lr_r2('value').over(w).alias('r2')).show()
(lr_r2 return r2)
Thanks !