2

I have two vectors x and y, and I want to compute a rolling regression for those, e.g a on (x(1:4),y(1:4)), (x(2:5),y(2:5)), ...
Is there already a function for that? The best algorithm I have in mind for this is O(n), but applying separate linear regressions on every subarrays would be O(n^2). I'm working with Matlab and Python (numpy).

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
Flavian Hautbois
  • 2,940
  • 6
  • 28
  • 45
  • Maybe search for Savitzky–Golay filter. – seberg Mar 26 '13 at 12:27
  • I may be mistaken, but isn't doing separate linear regressions also just of `O(n)`? Amount of times you need to do a regression: `O(n)`, work to be done in each regression: `O(1)` (assuming the windowsize is constant) – Dennis Jaheruddin Mar 26 '13 at 12:35
  • That is if you use the results from the previous regression to compute the next one, which is what I want to do, and not what I mean by separate regressions. – Flavian Hautbois Mar 26 '13 at 12:56
  • The Savitzky–Golay does not really adress my needs, as I also want to use the regression error as well as the coefficients of the regression – Flavian Hautbois Mar 26 '13 at 12:58
  • Does pandas.ols do what you want? http://pandas.pydata.org/pandas-docs/dev/computation.html#linear-and-panel-regression – HYRY Mar 26 '13 at 13:04
  • ah, nvm you do not have equal sampling maybe... – seberg Mar 26 '13 at 13:11
  • Maybe this discussion on efficient moving windows will be helpful? http://stackoverflow.com/questions/4936620/using-strides-for-an-efficient-moving-average-filter – Paul Mar 26 '13 at 13:29
  • A little more detail in your question would be helpful. It's not clear to me why the naive solution would be O(n^2). It sounds like your are doing something a bit more sophisticated than windowed linear regression – Paul Mar 26 '13 at 13:33
  • In the naive solution you do the linear regression operation O(n) times. The time to compute them is believed to be O(n) but this may be wrong. – Flavian Hautbois Mar 26 '13 at 15:36
  • Solving the linear system itself is `O(m^p)` with `p` typically between 1 and 2 (I believe). If savitzky-golay approach can be used (i.e. equal spacing), you can basically get it down to `O(n*m) + O(m^p)` (where `O(m^p)` is probably negligible), otherwise you have something like `O(n*m^p)`. (Plus you have a better constant factor). (the m^p is not quite right, but you get the idea) – seberg Mar 26 '13 at 16:45

2 Answers2

2

No, there is NO function that will do a rolling regression, returning all the statistics you wish, doing it efficiently.

That does not mean you can't write such a function. To do so would mean multiple calls to a tool like conv or filter. This is how a Savitsky-Golay tool would work, which DOES do most of what you want. Make one call for each regression coefficient.

Use of up-dating and down-dating tools to use/modify the previous regression estimates will not be as efficient as the calls to conv, since you only need factorize a linear system ONCE when you then do the work with conv. Anyway, there is no need to do an update, as long as the points are uniformly spaced in the series. This is why Savitsky-Golay works.

  • What do you mean by 'most of what you need'? What can I get exactly? – Flavian Hautbois Mar 26 '13 at 14:08
  • You can get each regression coefficient from conv. Predictions are then simple algebraic operations, so computations of the residuals and therefore anything that uses them is trivial. But no single tool gives you what you need. You will need to write the code. –  Mar 26 '13 at 14:23
1
import numpy as np
# y=x*alpha+beta
# window_size - integer, x-numpy array, y-numpy array

x2=np.power(x,2)
xy=x*y
window = np.ones(int(window_size))
a1=np.convolve(xy, window, 'full')*window_size
a2=np.convolve(x, window, 'full')*np.convolve(y, window, 'full')
b1=np.convolve(x2, window, 'full')*window_size
b2=np.power(np.convolve(x, window, 'full'),2)
alphas=(a1-a2)/(b1-b2)
betas=(np.convolve(y, window, 'full')-alphas*np.convolve(x, window, 'full'))/float(window_size)
alphas=alphas[:-1*(window_size-1)] #numpy array of rolled alpha
betas=betas[:-1*(window_size-1)]   #numpy array of rolled beta