I am currently using statsmodels (although I would also be happy to use Scikit) to create a linear regression. On this particular model I am finding that when adding more than one factor to the model, the OLS algorithm spits out wild coefficients. These coefficients are both extremely high and low, which seems to optimise the algorithm by averaging out. It results in all of the factors being statistically insignificant. I am just wondering if there is a way that I can put an upper or lower limit on the coefficients such that the OLS has to optimize within these new boundaries?
Asked
Active
Viewed 287 times
1 Answers
0
I don't know if you can set a condition to OLS such that the absolute value of the coefficients are all less than a constant.
Regularization is a good alternative to this kind of problem though. Basically, L1 or L2 regularization penalize the sum of the coefficients in the optimization function, which pushes the coefficients of the least significant variables close to zero so they don't raise the value of the cost function.
Take a look at lasso, ridge and elastic net regression. They use L1, L2 and both forms of regularization respectively.
You can try the following in statsmodels:
# Import OLS
from statsmodels.regression.linear_model import OLS
# Initialize model
reg = OLS(endog=y, exog=X)
# Fit model
reg = reg.fit_regularized()

Arturo Sbr
- 5,567
- 4
- 38
- 76
-
I am attempting to use statsmodels.regression.linear_model.OLS.fit_regularized(method='elastic_net', alpha=0.0, L1_wt=0, start_params=None, profile_scale=False, refit=False), but I am not sure where I specify my training data. For OLS it looks like this: sm.OLS(Y_train, X_train). Do you know how to do this? – Charmalade Jun 02 '21 at 13:41
-
model = sm.OLS(Y_train, X_train) res=model.fit_regularized(method='elastic_net', alpha=0.0, L1_wt=0, start_params=None, profile_scale=False, refit=False), – Charmalade Jun 02 '21 at 13:56
-
1I added some code to my answer. Does that answer your question? – Arturo Sbr Jun 02 '21 at 14:00
-
And lastly, the .summary() function no longer seems to work now that the fit has been regularized. I can still see the paramaters with .params, but I need to see the P values and the r^2 value for the model. Is there another way of getting this information? – Charmalade Jun 02 '21 at 14:03
-
I'm not sure. I stumbled upon the same problem right now. The statsmodels version I used says "Not implemented" in the docstring of `reg.summary()`. – Arturo Sbr Jun 02 '21 at 14:13
-
1Thanks for your help. I think this might be the updated one: sm.regression.linear_model.OLSResults – Charmalade Jun 02 '21 at 14:13
-
I managed to find a post: https://stackoverflow.com/questions/40072870/statistical-summary-table-in-sklearn-linear-model-ridge that explained the use of the above function, but unfortunately I get the error: AttributeError: 'OLS' object has no attribute 'cov_params'. Unfortunately the method requires this to run, do you know how I can access it? https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLSResults.html – Charmalade Jun 02 '21 at 14:48