23

I have a classic linear regression problem of the form:

y = X b

where y is a response vector X is a matrix of input variables and b is the vector of fit parameters I am searching for.

Python provides b = numpy.linalg.lstsq( X , y ) for solving problems of this form.

However, when I use this I tend to get either extremely large or extremely small values for the components of b.

I'd like to perform the same fit, but constrain the values of b between 0 and 255.

It looks like scipy.optimize.fmin_slsqp() is an option, but I found it extremely slow for the size of problem I'm interested in (X is something like 3375 by 1500 and hopefully even larger).

  1. Are there any other Python options for performing constrained least squares fits?
  2. Or are there python routines for performing Lasso Regression or Ridge Regression or some other regression method which penalizes large b coefficient values?
ulmangt
  • 5,343
  • 3
  • 23
  • 36
  • sklearn LASSO: https://www.google.com/search?client=safari&rls=en&q=lasso+regression+sklearn&ie=UTF-8&oe=UTF-8 – anon01 Sep 04 '20 at 00:44

5 Answers5

10

Recent scipy versions include a solver:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.lsq_linear.html#scipy.optimize.lsq_linear

tillsten
  • 14,491
  • 5
  • 32
  • 41
  • Nice, on the surface that sounds like exactly what I need. Being able to provide weights to the `X` input variable matrix rows may actually also be very useful to me (I do have a sense of the reliability of various data points which that may let me take advantage of). I'll definitely give it a try, thanks! – ulmangt Apr 14 '12 at 15:58
  • It is not really good tested through, hope it will work for you. Code is pure python and should be easy to test. – tillsten Apr 14 '12 at 15:59
  • 1
    `scipy.opimize.nnls` is a good tip as well. Simply constraining to non-negative values may indeed be enough. `numpy.linalg.lstsq` solutions seemed to be balancing out huge positive `b` values with equally huge negative `b` values. – ulmangt Apr 14 '12 at 19:05
10

You mention you would find Lasso Regression or Ridge Regression acceptable. These and many other constrained linear models are available in the scikit-learn package. Check out the section on generalized linear models.

Usually constraining the coefficients involves some kind of regularization parameter (C or alpha)---some of the models (the ones ending in CV) can use cross validation to automatically set these parameters. You can also further constrain models to use only positive coefficents---for example, there is an option for this on the Lasso model.

conradlee
  • 12,985
  • 17
  • 57
  • 93
4

scipy-optimize-leastsq-with-bound-constraints on SO gives leastsq_bounds, which is scipy leastsq + bound constraints such as 0 <= x_i <= 255.
(Scipy leastsq wraps MINPACK, one of several implementations of the widely-used Levenberg–Marquardt algorithm a.k.a. damped least-squares.
There are various ways of implementing bounds; leastsq_bounds is I think the simplest.)

Community
  • 1
  • 1
denis
  • 21,378
  • 10
  • 65
  • 88
2

As @conradlee says, you can find Lasso and Ridge Regression implementations in the scikit-learn package. These regressors serve your purpose if you just want your fit parameters to be small or positive.

However, if you want to impose any other range as a bound for the fit parameters, you can build your own constrained Regressor with the same package. See the answer by David Dale to this question for an example.

Bremsstrahlung
  • 686
  • 1
  • 9
  • 23
1

I recently prepared some tutorials on Linear Regression in Python. Here is one of the options (Gekko) that includes constraints on the coefficients.

# Constrained Multiple Linear Regression
import numpy as np
nd = 100 # number of data sets
nc = 5   # number of inputs
x = np.random.rand(nd,nc)
y = np.random.rand(nd)

from gekko import GEKKO
m = GEKKO(remote=False); m.options.IMODE=2
c  = m.Array(m.FV,nc+1)
for ci in c:
    ci.STATUS=1
    ci.LOWER = -10
    ci.UPPER =  10
xd = m.Array(m.Param,nc)
for i in range(nc):
    xd[i].value = x[:,i]
yd = m.Param(y); yp = m.Var()
s =  m.sum([c[i]*xd[i] for i in range(nc)])
m.Equation(yp==s+c[-1])
m.Minimize((yd-yp)**2)
m.solve(disp=True)
a = [c[i].value[0] for i in range(nc+1)]
print('Solve time: ' + str(m.options.SOLVETIME))
print('Coefficients: ' + str(a))

It uses the nonlinear solver IPOPT to solve the problem that is better than the scipy.optimize.minimize solver. There are other constrained optimization methods in Python as well as discussed in Is there a high quality nonlinear programming solver for Python?.

John Hedengren
  • 12,068
  • 1
  • 21
  • 25