2

I have the following optimization scheme implemented under NNLS in scipy.

import numpy as np
from scipy.optimize import nnls 
from scipy import stats

#Define problem
A = np.array([[60., 90., 120.], 
              [30., 120., 90.]])

b = np.array([6700.5, 699.,])

# Add ones to ensure the solution sums to 1
b = np.hstack([b,1.0])
A = np.vstack([A,np.ones(3)])

x, rnorm = nnls(A,b)
print x
# the solution is
# [ 93.97933792   0.           0.        ]
# we expect it to sum to 1 if it's not skewed

As you can see the b vector is much higher than values in A. My question is what's the best/reasonable way to scale A and b so that the solution is not skewed.

Note that both A and b are gene expression raw data without pre-processing.

neversaint
  • 60,904
  • 137
  • 310
  • 477

1 Answers1

2

If you want to include the equality constraint, you can't really use the nnls routine, since it doesn't cater for equalities. If you are limited to what's on offer in scipy, you can use this:

import numpy as np
from scipy.optimize import minimize

#Define problem
A = np.array([[60., 90., 120.], 
              [30., 120., 90.]])

b = np.array([6700.5, 699.,])

#-----------------------------
# I tried rescaling the data by adding this two lines,
# so that they're in same scale.
# but why the solution is different?
#    x: array([ 1.,  0.,  0.])
# What's the correct way to go?
#-----------------------------
# A = A/np.linalg.norm(A,axis=0)
# b = b/np.linalg.norm(b)    

def f(x):
    return np.linalg.norm(A.dot(x) - b)

cons ={'type': 'eq',
       'fun': lambda x: sum(x) - 1}

x0 = [1, 0, 0]  # initial guess
minimize(f, x0, method='SLSQP', bounds=((0, np.inf),)*3, constraints=cons)

Output:

 status: 0
 success: True
    njev: 2
    nfev: 10
     fun: 6608.620222860367
       x: array([ 0.,  0.,  1.])
 message: 'Optimization terminated successfully.'
     jac: array([ -62.50927734, -100.675354  , -127.78314209,    0.        ])
     nit: 2

This minimises the objective function directly while also imposing the equality constraint you're interested in.

If speed is important, you can add the jacobian and hessian information, or even better, use a proper QP solver, as supplied by cvxopt.

neversaint
  • 60,904
  • 137
  • 310
  • 477
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • For clarity, would you mind including the complete code based on my OP. For example, it's not clear to me the `A` and `b` is before or after adding *ones* (1s). – neversaint Oct 29 '15 at 05:55
  • Thanks. Is your method equivalent to [scipy.optimize.fmin_slsqp](http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.fmin_slsqp.html)? – neversaint Oct 29 '15 at 06:46
  • 1
    Yes. minimize is just a general interface. Consider accepting the answer if it solves your problem. – chthonicdaemon Oct 29 '15 at 07:05
  • If there is no *equality constraint* what's the preferable method to use? `BFGS`, `CG`? – neversaint Oct 29 '15 at 07:12
  • do you mean `nnls` ? So let me clarify for my understanding. *with and without equality constraint* SQLSP is ok and if *strictly without equality constraint* then `nnls` is the way to go. – neversaint Oct 29 '15 at 07:17
  • 1
    Yes, I meant nnls. SLSQP will handle both nonnegativity constrains and the sum constraint. nnls will solve the problem more efficiently. – chthonicdaemon Oct 29 '15 at 07:22
  • I have one last question. Would you mind seeing my edit to your post? I made a comment inside the code. – neversaint Oct 29 '15 at 07:34
  • 1
    Hi @neversaint and @chthonicdaemon, the set of equations here has no solution and the returned solution of `x=[0,0,1]` is clearly incorrect. If you view the condition that`sum(x)=1` as a third equation (see linked [question](http://stackoverflow.com/questions/33385898/how-to-include-constraint-to-scipy-nnls-function-solution-so-that-it-sums-to-1?lq=1) ) then it can be solved exactly with `x=[-59.9125, -99.525 , 160.4375]`. The set of equations are overconstrained with the requirement that x is positive and sums to one (the method here is the right way to solve in general otherwise). – Ed Smith Oct 29 '15 at 10:03
  • 2
    @EdSmith I agree that x=[0, 0, 1] does not satisfy Ax=b, but the problem we are solving is to minimize the norm of Ax-b subject to x>0 and sum(x)=1 constraints. Do you mean that x=[0, 0, 1] does not minimize the norm when you say it is clearly incorrect? Without the equality, this is exactly the problem [nnls](http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.nnls.html) solves. – chthonicdaemon Oct 29 '15 at 10:43
  • 1
    @neversaint the reason your scaling leads to a different solution is because you are changing the problem by scaling left and right with different factors. You would only expect the same solution if you scaled uniformly on both sides. – chthonicdaemon Oct 29 '15 at 10:45
  • @chthonicdaemon: So one should not scale the way I do? And how do you suggest I can *scale uniformly*? Would you please give example? – neversaint Oct 29 '15 at 10:55
  • Hi @chthonicdaemon, sorry you're right, I got hung up on getting an exact solution (which is almost possible with `b=[67,60]` in the linked question). I guess the point here is `minimise` gets as close as possible (in a least squared sense) to solving the constrained problem which has no exact solution. – Ed Smith Oct 29 '15 at 10:57
  • 1
    @neversaint The problem is only *scaled* if you take Ax=b and find a new problem DAx = Db. These are algebraicly equivalent, and usually D is a suitable diagonal matrix. You can, for instance, choose D to have the inverses of b on the diagonal, which will make Db=[1, 1]^T. You now solve the problem with DA as the coefficient matrix. When you multiply by different factors left and right, you end up with a different set of equations. – chthonicdaemon Oct 29 '15 at 13:36
  • @chthonicdaemon: Ok. I'm just curious why didn't you use `np.linalg.norm((A.dot(x) - b)**2)` ? i.e with squared. – neversaint Oct 30 '15 at 00:30
  • 1
    Since it adds computational cost, but doesn't change the location of the minimum. – chthonicdaemon Oct 30 '15 at 03:02