13

I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights.

To be more precise, I am doing a scoring system and this is why some of my variables should have a positive or negative impact on the note. Yet, when running my model, the results do not fit what I am expecting, some of my 'positive' variables get negative coefficients and vice versa.

As an example, let's suppose my model is :

y = W0*x0 + W1*x1 + W2*x2 

Where x2 is a 'positive' variable, I would like to put a constraint over W2 to be positive !

I have been looking around a lot about this issue but I've not found anything about constraints on specific weights/coefficients, all that I've found is about setting all coefficients positive or summing them to one.

I am working on Python using the ScikitLearn packages. This is how I get my best model :

def ridge(Xtrain, Xtest, Ytrain, Ytest, position):
    param_grid={'alpha':[0.01 , 0.1, 1, 10, 50, 100, 1000]}
    gs = grid_search.GridSearchCV(Ridge(), param_grid=param_grid, n_jobs=-1, cv=3)
    gs.fit(Xtrain, Ytrain)
    hatytrain = gs.predict(Xtrain)
    hatytest = gs.predict(Xtest)

Any idea of how I could assign a constraint on the coefficient of a specific variable ? Probably going to be burdensome to define each constraint but I have no idea how to do otherwise.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Benjamin Salem
  • 131
  • 1
  • 3
  • Why do you need to use scikit-learn for this? this is just a function fitting problem, isn't it. I bet there are better packages for just that task, where constraints on the to be fit parameters can be easily specified – mzoll May 18 '18 at 11:24
  • You might consider implementing "brick wall" constraints, in which if inside the function being fitted a constraint is violated, a very large value - and therefore a very large error - is returned. This method is somewhat crude, however as a practical matter it is easily coded and easy to test. – James Phillips May 18 '18 at 13:05

2 Answers2

14

Scikit-learn does not allow such constraints on the coefficients.

But you can impose any constraints on coefficients and optimize the loss with coordinate descent if you implement your own estimator. In the unconstraint case, coordinate descent produces the same result as OLS in reasonable number of iterations.

I've written a class that imposes upper and lower bounds on LinearRegression coefficients. You can extend it to use Ridge or evel Lasso penalty if you want:

from sklearn.linear_model.base import LinearModel
from sklearn.base import RegressorMixin
from sklearn.utils import check_X_y
import numpy as np

class ConstrainedLinearRegression(LinearModel, RegressorMixin):

    def __init__(self, fit_intercept=True, normalize=False, copy_X=True, nonnegative=False, tol=1e-15):
        self.fit_intercept = fit_intercept
        self.normalize = normalize
        self.copy_X = copy_X
        self.nonnegative = nonnegative
        self.tol = tol

    def fit(self, X, y, min_coef=None, max_coef=None):
        X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], y_numeric=True, multi_output=False)
        X, y, X_offset, y_offset, X_scale = self._preprocess_data(
            X, y, fit_intercept=self.fit_intercept, normalize=self.normalize, copy=self.copy_X)
        self.min_coef_ = min_coef if min_coef is not None else np.repeat(-np.inf, X.shape[1])
        self.max_coef_ = max_coef if max_coef is not None else np.repeat(np.inf, X.shape[1])
        if self.nonnegative:
            self.min_coef_ = np.clip(self.min_coef_, 0, None)

        beta = np.zeros(X.shape[1]).astype(float)
        prev_beta = beta + 1
        hessian = np.dot(X.transpose(), X)
        while not (np.abs(prev_beta - beta)<self.tol).all():
            prev_beta = beta.copy()
            for i in range(len(beta)):
                grad = np.dot(np.dot(X,beta) - y, X)
                beta[i] = np.minimum(self.max_coef_[i], 
                                     np.maximum(self.min_coef_[i], 
                                                beta[i]-grad[i] / hessian[i,i]))

        self.coef_ = beta
        self._set_intercept(X_offset, y_offset, X_scale)
        return self    

You can use this class, for example, to make all coefficients non-negative

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
X, y = load_boston(return_X_y=True)
model = ConstrainedLinearRegression(nonnegative=True)
model.fit(X, y)
print(model.intercept_)
print(model.coef_)

This produces an output like

-36.99292986145538
[0.         0.05286515 0.         4.12512386 0.         8.04017956
 0.         0.         0.         0.         0.         0.02273805
 0.        ]

You can see that most coefficients are zero. An ordinary LinearModel would have made them negative:

model = LinearRegression()
model.fit(X, y)
print(model.intercept_)
print(model.coef_)

which would return to you

36.49110328036191
[-1.07170557e-01  4.63952195e-02  2.08602395e-02  2.68856140e+00
 -1.77957587e+01  3.80475246e+00  7.51061703e-04 -1.47575880e+00
  3.05655038e-01 -1.23293463e-02 -9.53463555e-01  9.39251272e-03
 -5.25466633e-01]

You can also impose arbitrary bounds for any coefficients you choose - that's what you asked for. For example, in this setup

model = ConstrainedLinearRegression()
min_coef = np.repeat(-np.inf, X.shape[1])
min_coef[0] = 0
min_coef[4] = -1
max_coef = np.repeat(4, X.shape[1])
max_coef[3] = 2
model.fit(X, y, max_coef=max_coef, min_coef=min_coef)
print(model.intercept_)
print(model.coef_)

you would get an output

24.060175576410515
[ 0.          0.04504673 -0.0354073   2.         -1.          4.
 -0.01343263 -1.17231216  0.2183103  -0.01375266 -0.7747823   0.01122374
 -0.56678676]

Update. This solution can be adapted to work with constraints on linear combinations of the coeffitients (e.g. their sum) - in this case, individual constraints for each coefficient would be recalculated on each step. This Github gist provides an example.

UPDATE Due to popularity of this question, I have created a package with my implementation of constrained linear regression: https://github.com/avidale/constrained-linear-regression. You can install it with pip install constrained-linear-regression. Pull requests are welcome!

David Dale
  • 10,958
  • 44
  • 73
  • Thank you for your great answer, it seems that it is exactly what I was looking for. I will try it and give you a feedback afterwards ! – Benjamin Salem May 23 '18 at 12:50
  • This is an amazing answer. Thanks for sharing it. Even though it's a little bit late for asking, would you also know how to add a constraint to the sum of all the coefficients? For example for a stock portfolio, not only coefficients can't be negative but their sum can't be greater than 1. Thanks! – Angelo Aug 22 '19 at 15:50
  • If you use coordinate descent, then on each step you update only one coefficient. Therefore, the constraint to the sum of coefficients can be represented as a lower/upper bound on this particular coefficient. The only difference is that the value of this constraint would be recalculated on each step. – David Dale Aug 23 '19 at 07:39
  • 1
    Of course, you need to provide a feasible initial solution, and finding it may be a separate problem. If it is getting difficult, you may want to look for specialized packages for constrained optimization (like `cvxopt`), instead of coding the optimization routine by yourself – David Dale Aug 23 '19 at 07:48
  • Please take a look at this gist: OLS with general linear inequality constraints https://gist.github.com/avidale/6668635c318aceebe0142de013a4cf77 – David Dale Aug 23 '19 at 08:14
  • @DavidDale I have used the scipy.optimize.differential_evolution genetic algorithm to find initial parameter estimates, and have had good success with it. – James Phillips Aug 23 '19 at 13:19
  • @DavidDale how can I add lasso/ridge penalties to this code? – throwawaydisplayname Sep 30 '21 at 15:13
  • @throwawaydisplayname incorporating ridge penalty is pretty easy because the loss function is still quadratic, so you just add `beta` to the gradient and `diag(1)` to the hessian. However, adding lasso will break the quadraticity of the loss, so the Newton algorithm will not work exactly. But you can use it on your own risk. I have wrapped the code with support of lasso and ridge as a PyPI package: https://github.com/avidale/constrained-linear-regression – David Dale Sep 30 '21 at 23:16
1

In version 0.24.2 of scikit-learn, you can force the algorithm to use positive coefficients by using the parameter positive=True to the LinearRegression, by multiplying the columns for which you want a negative coefficient by -1 you should get what you want.

Benjamin Breton
  • 1,388
  • 1
  • 13
  • 42