13

I am trying to fit vector autoregressive (VAR) models using the generalized linear model fitting methods included in scikit-learn. The linear model has the form y = X w, but the system matrix X has a very peculiar structure: it is block-diagonal, and all blocks are identical. To optimize performance and memory consumption the model can be expressed as Y = BW, where B is a block from X, and Y and W are now matrices instead of vectors. The classes LinearRegression, Ridge, RidgeCV, Lasso, and ElasticNet readily accept the latter model structure. However, fitting LassoCV or ElasticNetCV fails due to Y being two-dimensional.

I found https://github.com/scikit-learn/scikit-learn/issues/2402 From this discussion I assume that the behavior of LassoCV/ElasticNetCV is intended. Is there a way to optimize the alpha/rho parameters other than manually implementing cross-validation?

Furthermore, Bayesian regression techniques in scikit-learn also expect y to be one-dimensional. Is there any way around this?

Note: I use scikit-learn 0.14 (stable)

MB-F
  • 22,770
  • 4
  • 61
  • 116
  • Why are you using regression models for auto-regressive process? What the actual nature of your system: Y_t=F(Y_{t-1}), Y_t=F(Y_{t-1}, X_t) or Y_t=F(X_t)? – Andrey Shokhin Dec 24 '13 at 08:00
  • I forgot to mention that the AR process is linear with additive noise. So I suppose the nature of the system would be Y_t=F(Y_{t-1}, X_t), where F() is a linear function and X_t is white noise. – MB-F Dec 26 '13 at 10:11
  • 2
    So look at this :http://statsmodels.sourceforge.net/stable/generated/statsmodels.tsa.vector_ar.var_model.VAR.html#statsmodels.tsa.vector_ar.var_model.VAR – Andrey Shokhin Dec 26 '13 at 12:29
  • Very good suggestion. Statsmodel has all the functionality for one's everyday VAR needs. Unfortunately there are reasons why I cannot use it: (1) I want to avoid the additional dependency. (2) I need to support regularized and sparse estimators which are available in scikit-learn. – MB-F Dec 26 '13 at 12:58

2 Answers2

3

How crucial is the performance and memory optimization gained by using this formulation of the regression? Given that your reformulation breaks scikit-learn, I wouldn't really call it an optimization... I would suggest:

  1. Running the unoptimized version and waiting (if possible).

  2. Git pull the following code, which supposedly solves your problem. It's referenced in the conversation you posted from the scikit-learn github project. See here for instructions on building scikit-learn from a git pull. You can then add the branched scikit-learn location to your python path and execute your regression using the modified library code. Be sure to post your experiences and any issues you encounter; I'm sure the scikit developers would appreciate it.

fredbaba
  • 1,466
  • 1
  • 15
  • 26
  • I have awarded the bounty for your answer because it contains good suggestions. However, I am not yet entirely convinced to accept your answer. Running the unoptimized version will be worth a try, but from a previous matlab implementation I expect a huge difference in memory consumption (factor 100) and performance (seconds -> minutes). I have not considered your second suggestion before because they seemed to optimize each 'task' independently (i.e. different alpha for each row in **W**). Is this actually the case? – MB-F Dec 29 '13 at 09:58
  • That seems to be the case. If I'm understanding the discussion from the link you posted, the original implementors felt that Lasso and E-Net were ill-suited to multiple regression problems. Because E-Net (and Lasso as a subset of E-Net) are sensitive to the choice of alpha, in a multiple regression one component task might "dominate" the regression, leading to poor performance on the other tasks. Can you give some more background on the dimensionality and nature of your problem? Was your original problem a VAR problem, or did the VAR structure arise from your reformulation? – fredbaba Dec 29 '13 at 18:47
  • 1
    Also, if this discussion gets sufficiently technical it might be worth asking the community at [Cross Validated](http://stats.stackexchange.com/)... – fredbaba Dec 29 '13 at 18:49
  • The original problem is fitting VAR models to time series data (EEG sources) for subsequent connectivity analysis. I am writing a library so I need to be flexible in dimensionality. My own implementation of Ridge Regression works well enough, but I want to support sklearn as model fitting backend so that different methods are available to the user, wihtout me reimplementing the wheel. Using a different regularization parameter for each "task" feels not right; the original problem is not expressed in terms of "tasks" but consists of one model describing the interaction of multiple signals. – MB-F Jan 04 '14 at 17:04
2

To predict matrices instead of vectors there is for Lasso and ElasticNet their MultiTask* counterpart:

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskLasso.html http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskElasticNet.html