Does CVXPY (with the ECOS solver) require to scale the decision variables?

Question

I am new to CVXPY and I am not an optimization expert, so please be patient with me. I am trying to solve a very standard problem, i.e., the (ordinary) least squares problem, in CVXPY using the solver ECOS. I am aware that there are very specialized algorithms to solve this class of problems, but I chose it as a learning example to familiarize myself with CVXPY/ECOS due to its conceptual simplicity and widespread usage. The problem can be formulated as

where, as usual, y is a vector of dimension n × 1, X is a n × p design matrix, X_j represents the vector corresponding to column j of the matrix X, β is the p × 1 parameter vector, and β_j is the j^th parameter.

Clearly, this is a quadratic minimization problem that the ECOS solver (being a conic optimizer) should be able to solve, however, I observed an unexpected behavior, namely that the solver's ability to find a solution depends on the magnitude of the input data.

Specifically, as can be seen in my code snippet below, I generated a mock random vector y and matrix X ^* and tried to solve the least squares problem two times: one with the variables X and y as is and a second time with the variables X and y scaled (y₂ = y × 10,000,000 and the columns 2 to 14 of X multiplied by 1,000). To my surprise, ECOS found a solution to the first problem (unscaled data), but failed to do so and called the second problem (scaled data) infeasible.

Why does this happen? Is there a limitation of the variable scale in the ECOS solver or in the CVXPY interface?

Perhaps the CVXPY "atom" sum_squares returns a value that is too large when the data is scaled, making the solver conclude that the problem is infeasible. If that is the case, how should I scale X and y, and what should I do to recover the "correct" solution (i.e., in the same scale as the original data)?

import cvxpy as cp
import numpy as np


m = 133
n = 17

X = np.empty((m, n))
X[:, 0] = 1
X[:, 1:n-3] = np.random.rand(m, n-4)
X[:, n-3:] = np.random.randint(low=0, high=2, size=(m, 3))
X_1 = X.copy()
X_2 = X.copy()
X_2[:, 1:n-3] = X_2[:, 1:n-3] * 1_000
y = np.random.rand(m)
y_1 = y.copy()
y_2 = y.copy() * 10_000_000

params = cp.Variable(n)
objective_1 = cp.Minimize(cp.sum_squares(y_1 - X_1 @ params))
objective_2 = cp.Minimize(cp.sum_squares(y_2 - X_2 @ params))
problem_1 = cp.Problem(objective_1)
problem_2 = cp.Problem(objective_2)

problem_1.solve(solver=cp.ECOS, verbose=True)
print("optimal value with ECOS:", problem_1.value)
print("solution", params.value)

problem_2.solve(solver=cp.ECOS, verbose=True)
print("optimal value with ECOS:", problem_1.value)
print("solution", params.value)

Below is the CVXPY/ECOS output of the two problems for reference.

problem_1.solve(solver=cp.ECOS, verbose=True)

===============================================================================
                                     CVXPY
                                     v1.3.1
===============================================================================
(CVXPY) Mar 24 06:33:24 PM: Your problem has 17 variables, 0 constraints, and 0 parameters.
(CVXPY) Mar 24 06:33:24 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Mar 24 06:33:24 PM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Mar 24 06:33:24 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
-------------------------------------------------------------------------------
                                  Compilation
-------------------------------------------------------------------------------
(CVXPY) Mar 24 06:33:24 PM: Compiling problem (target solver=ECOS).
(CVXPY) Mar 24 06:33:24 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> ECOS
(CVXPY) Mar 24 06:33:24 PM: Applying reduction Dcp2Cone
(CVXPY) Mar 24 06:33:24 PM: Applying reduction CvxAttr2Constr
(CVXPY) Mar 24 06:33:24 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Mar 24 06:33:24 PM: Applying reduction ECOS
(CVXPY) Mar 24 06:33:24 PM: Finished problem compilation (took 1.675e-02 seconds).
-------------------------------------------------------------------------------
                                Numerical solver
-------------------------------------------------------------------------------
(CVXPY) Mar 24 06:33:24 PM: Invoking solver ECOS  to obtain a solution.

ECOS 2.0.10 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
 0  +0.000e+00  -5.259e-03  +1e+02  5e-01  5e-04  1e+00  5e+01    ---    ---    1  1  - |  -  -
 1  +2.044e-02  +2.501e-02  +4e+00  3e-02  2e-05  4e-02  2e+00  0.9617  1e-04   1  2  2 |  0  0
 2  +1.874e+00  +2.913e+00  +1e+00  2e-02  1e-05  1e+00  7e-01  0.9890  4e-01   2  2  2 |  0  0
 3  +6.400e+00  +8.939e+00  +2e-01  1e-02  7e-06  3e+00  1e-01  0.9862  2e-01   3  3  3 |  0  0
 4  +8.877e+00  +9.436e+00  +2e-02  2e-03  1e-06  6e-01  2e-02  0.8759  3e-03   3  2  2 |  0  0
 5  +9.473e+00  +1.059e+01  +1e-02  2e-03  1e-06  1e+00  9e-03  0.7220  4e-01   3  3  3 |  0  0
 6  +1.061e+01  +1.062e+01  +3e-04  5e-05  2e-08  1e-02  2e-04  0.9823  3e-03   3  2  2 |  0  0
 7  +1.063e+01  +1.064e+01  +4e-05  5e-06  3e-09  1e-03  2e-05  0.8899  4e-03   3  3  3 |  0  0
 8  +1.064e+01  +1.064e+01  +1e-06  7e-07  1e-10  1e-04  1e-06  0.9836  4e-02   4  3  3 |  0  0
 9  +1.064e+01  +1.064e+01  +5e-08  1e-08  5e-12  4e-06  4e-08  0.9644  1e-04   3  2  2 |  0  0
10  +1.064e+01  +1.064e+01  +3e-09  2e-08  3e-14  2e-07  2e-09  0.9593  1e-02   4  2  2 |  0  0
11  +1.064e+01  +1.064e+01  +4e-11  2e-09  3e-14  3e-09  3e-11  0.9866  4e-04   4  2  2 |  0  0

OPTIMAL (within feastol=2.4e-09, reltol=3.3e-12, abstol=3.5e-11).
Runtime: 0.007806 seconds.

-------------------------------------------------------------------------------
                                    Summary
-------------------------------------------------------------------------------
(CVXPY) Mar 24 06:33:24 PM: Problem status: optimal
(CVXPY) Mar 24 06:33:24 PM: Optimal value: 1.064e+01
(CVXPY) Mar 24 06:33:24 PM: Compilation took 1.675e-02 seconds
(CVXPY) Mar 24 06:33:24 PM: Solver (including time spent in interface) took 8.717e-03 seconds
10.640240221833896

problem_2.solve(solver=cp.ECOS, verbose=True)

===============================================================================
                                     CVXPY
                                     v1.3.1
===============================================================================
(CVXPY) Mar 24 06:34:04 PM: Your problem has 17 variables, 0 constraints, and 0 parameters.
(CVXPY) Mar 24 06:34:04 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Mar 24 06:34:04 PM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Mar 24 06:34:04 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
-------------------------------------------------------------------------------
                                  Compilation
-------------------------------------------------------------------------------
(CVXPY) Mar 24 06:34:04 PM: Compiling problem (target solver=ECOS).
(CVXPY) Mar 24 06:34:04 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> ECOS
(CVXPY) Mar 24 06:34:04 PM: Applying reduction Dcp2Cone
(CVXPY) Mar 24 06:34:04 PM: Applying reduction CvxAttr2Constr
(CVXPY) Mar 24 06:34:04 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Mar 24 06:34:04 PM: Applying reduction ECOS
(CVXPY) Mar 24 06:34:04 PM: Finished problem compilation (took 6.039e-03 seconds).
-------------------------------------------------------------------------------
                                Numerical solver
-------------------------------------------------------------------------------
(CVXPY) Mar 24 06:34:04 PM: Invoking solver ECOS  to obtain a solution.

ECOS 2.0.10 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
 0  +0.000e+00  -1.789e-04  +3e+07  1e-01  7e-06  1e+00  2e+07    ---    ---    1  1  - |  -  -
 1  -4.612e+07  -4.427e+07  +3e+07  3e+01  7e-04  2e+06  2e+07  0.0011  1e+00   1  1  1 |  0  0
 2  -4.510e+07  -2.988e+07  +1e+06  1e+00  2e-05  2e+07  5e+05  0.9704  1e-04   2  1  1 |  0  0
 3  +6.561e+08  +8.610e+08  +1e+04  2e-01  1e-04  2e+08  5e+03  0.9890  1e-04   2  2  2 |  0  0
 4  +5.427e+09  +6.944e+09  +1e+02  2e-02  5e-05  2e+09  6e+01  0.9890  1e-04   2  1  1 |  0  0
 5  +9.734e+09  +1.050e+10  +4e+00  2e-03  3e-05  8e+08  2e+00  0.9582  1e-04   2  2  2 |  0  0
 6  +1.566e+10  +1.762e+10  +5e-01  1e-03  1e-03  2e+09  4e-01  0.9890  1e-01   1  2  2 |  0  0
 7  +2.388e+10  +4.886e+10  +2e-01  3e-03  2e-02  2e+10  1e-01  0.9890  2e-01   3  2  2 |  0  0
 8  +6.709e+10  +8.528e+10  +6e-03  3e-04  4e-04  2e+10  5e-03  0.9865  3e-02   2  2  3 |  0  0
 9  +1.056e+11  +1.318e+11  +5e-04  1e-04  7e-03  3e+10  7e-04  0.9410  6e-02   3  2  2 |  0  0

PRIMAL INFEASIBLE (within feastol=7.6e-09).
Runtime: 0.005062 seconds.

-------------------------------------------------------------------------------
                                    Summary
-------------------------------------------------------------------------------
(CVXPY) Mar 24 06:34:04 PM: Problem status: infeasible
(CVXPY) Mar 24 06:34:04 PM: Optimal value: inf
(CVXPY) Mar 24 06:34:04 PM: Compilation took 6.039e-03 seconds
(CVXPY) Mar 24 06:34:04 PM: Solver (including time spent in interface) took 4.988e-04 seconds
inf

The script was executed on a Windows 10 Pro version 22H2 machine with a 11th Gen Intel(R) Core(TM) i5 Intel processor running Python 3.10.9, CVXPY 1.3.1, ECOS 2.0.11, and NumPy 1.23.5.

^* Note that the first column of X has all entries equal to 1 to mimic the intercept of an OLS design matrix 1 and the last three columns are binary to mimic dummy variables (due to the random nature of the mock data, we cannot exclude multicollinearity but the chance of it occurring is small, I always checked X for that before attempting to solve the problem, and it should not matter in this context).

I did read the answer to this [question](https://stackoverflow.com/questions/65526377/cvxpy-returns-infeasible-inaccurate-on-quadratic-programming-optimization-proble?rq=2), where it was suggested to replace the quadratic function `sum_squares` with `norm` (I'm assuming I don't have to square the l2-norm afterward). This replacement makes my problem 2 solvable by ECOS, but then I still don't understand what the root cause of the issue is. — AB8, Mar 25 '23 at 00:45
I see no surprise here. Of course the solvers are very sensitive to scaling, because the worse scaling the more error is introduced in floating point operations. In particular in your problem you are potentially forcing the solver to be looking at squared norm of a vector with entries potentially of order 10^7. You see this cannot work stably. I recommend https://docs.mosek.com/modeling-cookbook/practical.html from section 7.2 — Michal Adamaszek, Mar 26 '23 at 07:33
@MichalAdamaszek I appreciate your comment and the (very approachable) resource you shared. I "sort of" knew it, but I was looking for confirmation as I never faced it (to this extent) in a real problem. Is there a recommended scaling approach to stabilize the problem? I understand that it should be reformulated replacing `sum_squares` with `norm` (i.e., elevating the objective function to the 1/2 power), but how should I scale *X* and *y* and how would I recover the solution in the initial scale? — AB8, Mar 27 '23 at 13:28
I think I fail to understand why you want to perform that rescaling by 1e7 and compare the results of the two problems. If your data is good then use it and if it causes numerical issues then rescale it, but most likely down and not up. — Michal Adamaszek, Mar 28 '23 at 12:40
@MichalAdamaszek Given the different measurement units and magnitude of the variables in my problem, I thought of *standardizing* all variables (*y* and all columns of *X*). If I were to do that, I would need to convert the solution to the original scale. In a "classic" OLS setting, the conversion is done by multiplying the solution by std(y)/std(X) (except for the first element of the solution) as described [here](https://stats.stackexchange.com/questions/74622/converting-standardized-betas-back-to-original-variables), but I am interested in an optimization-based approach. — AB8, Mar 28 '23 at 15:06

Does CVXPY (with the ECOS solver) require to scale the decision variables?

0 Answers0