0

Hi I have the following code both produce the same result/output but I am trying to speed up the execution time as I have to run the code for a lot of different dfs.

I am trying to not use pandas as it is slower than using numpy and so I am trying to adjust my existing code to exclude pandas. I use a function to calculate the slope using linear regression and in pandas, I can easily use the apply method. I have not found a way to do something similar in NumPy as most of the search results direct me to pandas. I can get the same results using a loop that works and is quicker roughly 25% but I was wondering if there was a more efficient / speedier / more pythonic way that doesn't require a loop.

def reg_score(array):
        y = array
        x = np.arange(len(y))
        slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
        return slope

df["lin_reg_vals"] = df["values"].rolling(window=125,min_periods=125).apply(reg_score)

start = timer()

end = timer()
print(end - start)
start = timer()
for i in range(len(values)):
    if i < 124:
        pass
    else:
        x = momentumScore(values[i - 125+1:i+1])
end = timer()
print(end - start)

Output:

2.543098199999804
1.902773800000432
JPWilson
  • 691
  • 4
  • 14

1 Answers1

0

You're right to try to avoid pandas here, it's only going to slow you down.

As far as I know, there's no easy way to do a rolling regression in pure NumPy/SciPy land (I think you're currently using scipy.stats). But statsmodels has statsmodels.regression.rolling.RollingOLS which I think will do what you want and should be fast.

By way of another example, there's another question using it, with various suggestions in the thread.

Finally, there's a nice example in LOST, which is just generally worth knowing about.

Matt Hall
  • 7,614
  • 1
  • 23
  • 36