7

Suppose we are given a training dataset {yᵢ, xᵢ}, for i = 1, ..., n, where yᵢ can either be -1 or 1 and xᵢ can be e.g. a 2D or 3D point.

In general, when the input points are linearly separable, the SVM model can be defined as follows

min 1/2*||w||²
w,b

subject to the constraints (for i = 1, ..., n)

yᵢ*(w*xᵢ - b) >= 1

This is often called the hard-margin SVM model, which is thus a constrained minimization problem, where the unknowns are w and b. We can also omit 1/2 in the function to be minimized, given it's just a constant.

Now, the documentation about Matlab's quadprog states

x = quadprog(H, f, A, b) minimizes 1/2*x'*H*x + f'*x subject to the restrictions A*x ≤ b. A is a matrix of doubles, and b is a vector of doubles.

We can implement the hard-margin SVM model using quadprog function, to get the weight vector w, as follows

  • H becomes an identity matrix.
  • f' becomes a zeros matrix.
  • A is the left-hand side of the constraints
  • b is equal to -1 because the original constraint had >= 1, it becomes <= -1 when we multiply with -1 on both sides.

Now, I am trying to implement a soft-margin SVM model. The minimization equation here is

min (1/2)*||w||² + C*(∑ ζᵢ)
w,b

subject to the constraints (for i = 1, ..., n)

yᵢ*(w*xᵢ - b) >= 1 - ζᵢ

such that ζᵢ >= 0, where is the summation symbol, ζᵢ = max(0, 1 - yᵢ*(w*xᵢ - b)) and C is a hyper-parameter.

How can this optimization problem be solved using the Matlab's quadprog function? It's not clear to me how the equation should be mapped to the parameters of the quadprog function.

The "primal" form of the soft-margin SVM model (i.e. the definition above) can be converted to a "dual" form. I did that, and I am able to get the Lagrange variable values (in the dual form). However, I would like to know if I can use quadprog to solve directly the primal form without needing to convert it to the dual form.

nbro
  • 15,395
  • 32
  • 113
  • 196
user1067334
  • 243
  • 3
  • 7
  • 15

4 Answers4

9

I don't see how it can be a problem. Let z be our vector of (2n + 1) variables:

z = (w, eps, b)

Then, H becomes diagonal matrix with first n values on the diagonal equal to 1 and the last n + 1 set to zero:

H = diag([ones(1, n), zeros(1, n + 1)])

Vector f can be expressed as:

f = [zeros(1, n), C * ones(1, n), 0]'

First set of constrains becomes:

Aineq = [A1, eye(n), zeros(n, 1)]
bineq = ones(n, 1)

where A1 is a the same matrix as in primal form.

Second set of constraints becomes lower bounds:

lb = (inf(n, 1), zeros(n, 1), inf(n, 1))

Then you can call MATLAB:

z = quadprog(H, f, Aineq, bineq, [], [], lb);

P.S. I can be mistaken in some small details, but the general idea is right.

vharavy
  • 4,881
  • 23
  • 30
  • The way the "n" variable was thrown around confused me even more but I got the general idea. The n variable doesn't match up well in quite a few places. Thank you very much. – user1067334 Mar 22 '13 at 02:48
  • @user1067334 Does n represent number of features or number of training samples? – tusharfloyd Nov 01 '15 at 22:52
0

I wanted to clarify @vharavy answer because you could get lost while trying to deduce what 'n' means in his code. Here is my version according to his answer and SVM wikipedia article. I assume we have a file named "test.dat" which holds coordinates of test points and their class membership in the last column. Example content of "test.dat" with 3D points:

-3,-3,-2,-1
-1,3,2,1
5,4,1,1
1,1,1,1
-2,5,4,1
6,0,1,1
-5,-5,-3,-1
0,-6,1,-1
-7,-2,-2,-1

Here is the code:

data = readtable("test.dat");
tableSize = size(data);
numOfPoints = tableSize(1);
dimension = tableSize(2) - 1;
PointsCoords = data(:, 1:dimension);
PointsSide = data.(dimension+1);

C = 0.5; %can be changed
n = dimension;
m = numOfPoints; %can be also interpretet as number of constraints
%z = [w, eps, b]; number of variables in 'z' is equal to n + m + 1
H = diag([ones(1, n), zeros(1, m + 1)]);
f = [zeros(1, n), C * ones(1, m), 0];
Aineq = [-diag(PointsSide)*table2array(PointsCoords), -eye(m), PointsSide];
bineq = -ones(m, 1);
lb = [-inf(1, n), zeros(1, m), -inf];
z = quadprog(H, f, Aineq, bineq, [], [], lb);
JAJA
  • 40
  • 5
-1

If let z = (w; w0; eps)T be a the long vector with n+1+m elements.(m the number of points) Then,

H= diag([ones(1,n),zeros(1,m+1)]).
f = [zeros(1; n + 1); ones(1;m)]

The inequality constraints can be specified as :

A = -diag(y)[X; ones(m; 1); zeroes(m;m)] -[zeros(m,n+1),eye(m)],

where X is the n x m input matrix in the primal form.Out of the 2 parts for A, the first part is for w0 and the second part is for eps.

b = ones(m,1) 

The equality constraints :

Aeq = zeros(1,n+1 +m)
beq = 0

Bounds:

lb = [-inf*ones(n+1,1); zeros(m,1)]
ub = [inf*ones(n+1+m,1)]

Now, z=quadprog(H,f,A,b,Aeq,beq,lb,ub)

akuriako
  • 429
  • 5
  • 7
-1

Complete code. The idea is the same as above.

n = size(X,1);
m = size(X,2);
H = diag([ones(1, m), zeros(1, n + 1)]);
f = [zeros(1,m+1) c*ones(1,n)]';
p = diag(Y) * X;
A = -[p Y eye(n)];
B = -ones(n,1);
lb = [-inf * ones(m+1,1) ;zeros(n,1)];
z = quadprog(H,f,A,B,[],[],lb);
w = z(1:m,:);
b = z(m+1:m+1,:);
eps = z(m+2:m+n+1,:);
  • This code is not runnable. Can you provide a runnable example? For example, you should provide `X`, etc. – nbro May 13 '18 at 18:48