How to add "greater than 0 and sums to 1" constraint to a regression in Python?

Question

I am using statsmodels (open to other python options) to run some linear regression. My problem is that I need the regression to have no intercept and constraint the coefficients in the range (0,1) and also sum to 1.

I tried something like this (for the sum of 1, at least):

from statsmodels.formula.api import glm
import pandas as pd

df = pd.DataFrame({'revised_guess':[0.6], "self":[0.55], "alter_1":[0.45], "alter_2":[0.2],"alter_3":[0.8]})
mod = glm("revised_guess ~ self + alter_1 + alter_2 + alter_3 - 1", data=df)
res = mod.fit_constrained(["self + alter_1 + alter_2 + alter_3  = 1"],
                          start_params=[0.25,0.25,0.25,0.25])
res.summary()

but still struggling to enforce the 'non-negative' coefficients constraint.

Looks like that your problem fails into [linear programming](https://en.wikipedia.org/wiki/Linear_programming) model. I am not sure that statsmodels supports that. — Luka Rahne, Mar 11 '19 at 16:03
I believe you may be looking for [`sklearn.linear_model.LinearRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) — Jab, Mar 12 '19 at 16:45
Please help me understand - How can you make a negative coefficient positive? If an `x` has a negative relationship with your `y`, what do you mean by constraining its coefficient into the (0,1) range? How can you revert a negative relationship to a positive one? — FatihAkici, Mar 18 '19 at 19:25
@FatihAkici just as you didn't get a response to your question. Forcing the coefficient to be positive makes sense in certain contexts where you are looking to find the optimal combination of inputs and negative weights are infeasible. E.g. I want to find the optimal weight to give to the effort of each team member as a function of their skills, I cannot place a negative weight on someone. Your doubt makes sense if you only consider estimating an empirical relationship, e.g. the correlation between rainfall and umbrella use, but regression analysis can be used for a wealth other reasons. — Joe Emmens, Jan 03 '22 at 09:01
Possibly a duplicate : https://stackoverflow.com/q/33385898/6151828 — Roger Vadim, Sep 29 '22 at 07:44

score 4 · Answer 1 · answered Mar 14 '19 at 07:39

You could NNLS(Non-Negative Least Squares) which is defined under scipy. It based on FORTRAN non negative least square solver. You cant add constraints to it. So add another equation such that x1+x2+x3=1 to the input equations.

import numpy as np
from scipy.optimize import nnls 
##Define the input vectors
A = np.array([[1., 2., 5.], 
              [5., 6., 4.],
              [1.,  1.,   1. ]])

b = np.array([4., 7., 2.])

##Caluculate nnls
x, resdiual_norm = nnls(A,b)


##Find the difference
print(np.sum(A*x,1)-b)

Now perform NNLS over this matrix, it will return the x values and the residuals.

score -2 · Answer 2 · answered Mar 11 '19 at 16:03

-2

Simply do the L1 regularized regression:

import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS
model = sm.OLS(Y,X)
model2=model.fit_regularized(method='elastic_net', alpha=0.0, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
model2.params

... and tune hyperparameters.

answered Mar 11 '19 at 16:03

razimbres

4,715
5
23
50

1

How can you make a negative coefficient positive though? If an x has a negative relationship with y, what does it mean to constrain its coefficient into the (0,1) range? How can one revert a negative relationship to a positive one? – FatihAkici Mar 19 '19 at 16:13

How to add "greater than 0 and sums to 1" constraint to a regression in Python?

2 Answers2

Linked