1

I have implemented a script that does constrained optimization for solving the optimal parameters of Support Vector Machines model. I noticed that my script for some reason gives inaccurate results (although very close to the real value). For example the typical situation is that the result of a calculation should be exactly 0, but instead it is something like

-1/18014398509481984 = -5.551115123125783e-17

This situation happens when I multiply matrices with vectors. What makes this also strange is that if I do the multiplications by hand in the command window in Matlab I get exactly 0 result.

Let me give an example: If I take the vectors Aq = [-1 -1 1 1] and x = [12/65 28/65 32/65 8/65]' I get exactly 0 result from their multiplication if I do this in the command window, as you can see in the picture below:

enter image description here

If on the other hand I do this in my function-script I don't get the result being 0 but rather the value -1/18014398509481984.

Here is the part of my script that is responsible for this multiplication (I've added the Aq and x into the script to show the contents of Aq and x as well):

disp('DOT PRODUCT OF ACTIVE SET AND NEW POINT: ')
Aq
x
Aq*x

Here is the result of the code above when run:

enter image description here

As you can see the value isn't exactly 0 even though it really should be. Note that this problem doesn't occur for all possible values of Aq and x. If Aq = [-1 -1 1 1] and x = [4/13 4/13 4/13 4/13] the result is exactly 0 as you can see below:

enter image description here

What is causing this inaccuracy? How can I fix this?

P.S. I didn't include my whole code because it's not very well documented and few hundred lines long, but I will if requested.

Thank you!

UPDATE: new test, by using Ander Biguri's advice:

enter image description here

UPDATE 2: THE CODE

function [weights, alphas, iters] = solveSVM(data, labels, C, e)
% FUNCTION [weights, alphas, iters] = solveSVM(data, labels, C, e)
% 
% AUTHOR: jjepsuomi
%
% VERSION: 1.0
%
% DESCRIPTION: 
% - This function will attempt to solve the optimal weights for a Support
% Vector Machines (SVM) model using active set method with gradient
% projection.
% 
% INPUTS: 
% "data" a n-by-m data matrix. The number of rows 'n' corresponds to the
% number of data points and the number of columns 'm' corresponds to the
% number of variables. 
% "labels" a 1-by-n row vector of data labels from the set {-1,1}. 
% "C" Box costraint upper limit. This will constrain the values of 'alphas'
% to the range 0 <= alphas <= C. If hard-margin SVM model is required set
% C=Inf. 
% "e" a real value corresponding to the convergence criterion, that is if
% solution Xi and Xi-1 are within distance 'e' from each other stop the
% learning process, i.e. IF |F(Xi)-F(Xi-1)| < e ==> stop learning process.
%
% OUTPUTS: 
% "weights" a vector corresponding to the optimal decision line parameters.
% "alphas" a vector of alpha-values corresponding to the optimal solution
% of the dual optimization problem of SVM.
% "iters" number of iterations until learning stopped.
%
% EXAMPLE USAGE 1: 
% 
% 'Hard-margin SVM': 
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, Inf, 10^-100)
% 
% EXAMPLE USAGE 2:
%
% 'Soft-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, 0.8, 10^-100)


% STEP 1: INITIALIZATION OF THE PROBLEM
format long
% Calculate linear kernel matrix
L = kron(labels', labels);
K = data*data';
% Hessian matrix
Qd = L.*K;
% The minimization function 
L = @(a) (1/2)*a'*Qd*a - ones(1, length(a))*a;
% Gradient of the minimizable function 
gL = @(a) a'*Qd - ones(1, length(a));


% STEP 2: THE LEARNING PROCESS, ACTIVE SET WITH GRADIENT PROJECTION

% Initial feasible solution (required by gradient projection)
x = zeros(length(labels), 1);
iters = 1;
optfound = 0;

while optfound == 0 % criterion met    

    % Negative of the gradient at initial solution
    g = -gL(x);
    % Set the active set and projection matrix
    Aq = labels; % In plane y^Tx = 0
    P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane projection
    % Values smaller than 'eps' are changed into 0
    P(find(abs(P-0) < eps)) = 0;
    d = P*g'; % Projection onto plane

    if ~isempty(find(x==0 | x==C)) % Constraints active?
        acinds = find(x==0 | x==C);
        for i = 1:length(acinds)
            if (x(acinds(i)) == 0 && d(acinds(i)) < 0) || x(acinds(i)) == C && d(acinds(i)) > 0
                % Make the constraint vector
                constr = zeros(1,length(x));
                constr(acinds(i)) = 1;
                Aq = [Aq; constr];
            end
        end
        % Update the projection matrix
        P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane / box projection
        % Values smaller than 'eps' are changed into 0
        P(find(abs(P-0) < eps)) = 0;
        d = P*g'; % Projection onto plane / border
    end

    %%%% DISPLAY INFORMATION, THIS PART IS NOT NECESSAY, ONLY FOR DEBUGGING

    if Aq*x ~= 0
        disp('ACTIVE SET CONSTRAINTS Aq :')
        Aq
        disp('CURRENT SOLUTION x :')
        x
        disp('MULTIPLICATION OF Aq and x')
        Aq*x
    end

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    % Values smaller than 'eps' are changed into 0
    d(find(abs(d-0) < eps)) = 0;
    if ~isempty(find(d~=0)) && rank(P) < length(x) % Line search for optimal lambda
        lopt = ((g*d)/(d'*Qd*d));       
        lmax = inf;   
        for i = 1:length(x)
            if d(i) < 0 && -x(i) ~= 0 && -x(i)/d(i) <= lmax
                lmax = -x(i)/d(i);
            elseif d(i) > 0 && (C-x(i))/d(i) <= lmax
                lmax = (C-x(i))/d(i);
            end
        end 
        lambda = max(0, min([lopt, lmax]));
        if abs(lambda) < eps
            lambda = 0;
        end
        xo = x;
        x = x + lambda*d;
        iters = iters + 1;
    end
    % Check whether search direction is 0-vector or 'e'-criterion met.
    if isempty(find(d~=0)) || abs(L(x)-L(xo)) < e
        optfound = 1;
    end
end

%%% STEP 3: GET THE WEIGHTS
alphas = x;
w = zeros(1, length(data(1,:)));
for i = 1:size(data,1)
    w = w + labels(i)*alphas(i)*data(i,:);
end
svinds = find(alphas>0); 
svind = svinds(1);
b = 1/labels(svind) - w*data(svind, :)';



%%% STEP 4: OPTIMALITY CHECK, KKT conditions. See KKT-conditions for reference.
weights = [b; w'];
datadim = length(data(1,:));
Q = [zeros(1,datadim+1); zeros(datadim, 1), eye(datadim)];
A = [ones(size(data,1), 1), data];
for i = 1:length(labels)
    A(i,:) = A(i,:)*labels(i);
end
LagDuG = Q*weights - A'*alphas;
Ac = A*weights - ones(length(labels),1);
alpA = alphas.*Ac;
LagDuG(any(abs(LagDuG-0) < 10^-14)) = 0;
if ~any(alphas < 0) && all(LagDuG == zeros(datadim+1,1)) && all(abs(Ac) >= 0) && all(abs(alpA) < 10^-6)
    disp('Optimal found, Karush-Kuhn-Tucker conditions satisfied.')
else
    disp('Optimal not found, Karush-Kuhn-Tucker conditions not satisfied.')
end

% VISUALIZATION FOR 2D-CASE
if size(data, 2) == 2 
    pinds = find(labels > 0);
    ninds = find(labels < 0);
    plot(data(pinds, 1), data(pinds, 2), 'o', 'MarkerFaceColor', 'red', 'MarkerEdgeColor', 'black')
    hold on
    plot(data(ninds, 1), data(ninds, 2), 'o', 'MarkerFaceColor', 'blue', 'MarkerEdgeColor', 'black')
    Xb = min(data(:,1))-1;
    Xe = max(data(:,1))+1;
    Yb = -(b+w(1)*Xb)/w(2);
    Ye = -(b+w(1)*Xe)/w(2);
    lineh = plot([Xb Xe], [Yb Ye], 'LineWidth', 2);
    supvh = plot(data(find(alphas~=0), 1), data(find(alphas~=0), 2), 'g.');
    legend([lineh, supvh], 'Decision boundary', 'Support vectors');
    hold off
end

NOTE:

If you run the EXAMPLE 1, you should get an output starting with the following:

enter image description here

As you can see, the multiplication between Aq and x don't produce value 0, even though they should. This is not a bad thing in this particular example, but if I have more data points with lots of decimals in them this inaccuracy becomes bigger and bigger problem, because the calculations are not exact. This is bad for example when I'm searching for a new direction vector when I'm moving towards the optimal solution in gradient projection method. The search direction isn't exactly the correct direction, but close to it. This is why I want the exactly correct values...is this possible?

I wonder if the decimals in the data points have something to do with the accuracy of my results. See the picture below:

enter image description here

So the question is: Is this caused by the data or is there something wrong in the optimization procedure...

jjepsuomi
  • 4,223
  • 8
  • 46
  • 74
  • 1
    try changing the accuracy of the representation of the command window. try `format long` and then run it again. Most likely that is not a perfect `0`, however matlab displays it like that for you (because it is very close) – Ander Biguri May 20 '15 at 11:29
  • Hi @AnderBiguri Thank you, I tried your suggestion in the command window. I added a screenshot. So there is no way of getting the 'exact' values? – jjepsuomi May 20 '15 at 11:34
  • `12+28 = 32+8` Equation is balanced. the correct ans is `0` Also when run in script i get the same answer. are you sure you are not copying the decimal values of `x` when you run the code in the script? – Santhan Salai May 20 '15 at 11:44
  • Hi @SanthanSalai the only two lines of code, where I modify the variable `x` are: `x = insol;` and `x = x + lambda*d;` where `insol` is a vector of zeroes, `lambda` is a real value of type `double` and `d` is a vector. P.S. `insol` is created with function `zeros` – jjepsuomi May 20 '15 at 11:48
  • I cannot reproduce the results you present as example. First of all: are you sure that all the numbers that you use are in `double` precision? MATLAB has this funny (and sometimes annoying) way of converting `double` to weaker precisions when mixing types (try `0.5 * int32(3)` at console to see what I mean). Most people expect the other way around. –  May 20 '15 at 12:10
  • Duplicate? http://stackoverflow.com/q/686439/2917957 – David May 20 '15 at 12:22
  • Thank you @CST-Link for your help. No I'm not sure of the precisions...I need to check that :) – jjepsuomi May 20 '15 at 12:24
  • @CST-Link What if I post my code here? Maybe you could try it as well? – jjepsuomi May 21 '15 at 09:16
  • P.S. I checked...all my variables contain class `double` values. – jjepsuomi May 21 '15 at 09:23
  • @jjepsuomi I use R2012a, and I cannot reproduce the non-zero result. Yeah, I'll run the code if you post it. It would be easier to trace. Hopefully it's not highly input-sensitive. :-) –  May 21 '15 at 09:29
  • Okay, I will try to clean it up and set up some comments with example of usage :) Thank you very much!! =) Wait for few mins ;) – jjepsuomi May 21 '15 at 09:31
  • @CST-Link Hi! I added the code now :) There are few example usages on how to use it :) P.S. the code isn't as optimal as possible, this is just my own testing x) The optimal solution for the example 1: is `X = [1/2 1/2 1 0]` – jjepsuomi May 21 '15 at 10:03
  • @CST-Link All the code needed should be there now. I added lastly also some visualization =) Please let me know if you get the same result as I do, I attached screenshots of what I get :) – jjepsuomi May 21 '15 at 10:26
  • You can also try the script with a random e.g. 2D data set (If 2D then you get a nice visualization ;) ). The data set must be LINEARLY SEPARABLE so that you can put a line between the two classes of clusters of data points. The data points in cluster one should have all label +1 and cluster 2 data points should have label -1 – jjepsuomi May 21 '15 at 10:33
  • I see `inv` inside your code now, see my updated answer, maybe it helps. – Martin Adámek May 21 '15 at 10:48
  • @jjepsuomi Sorry, I was caught up a bit in something else. I'll try the code starting from now. –  May 21 '15 at 12:28

2 Answers2

3

Do you use format function inside your script? It looks like you used somewhere format rat.

You can always use matlab eps function, that returns precision that is used inside matlab. The absolute value of -1/18014398509481984 is smaller that this, according to my Matlab R2014B:

format long
a = abs(-1/18014398509481984)
b = eps
a < b

This basically means that the result is zero (but matlab stopped calculations because according to eps value, the result was just fine).

Otherwise you can just use format long inside your script before the calculation.

Edit

I see inv function inside your code, try replacing it with \ operator (mldivide). The results from it will be more accurate as it uses Gaussian elimination, without forming the inverse.

The inv documentation states:

In practice, it is seldom necessary to form the explicit inverse of a matrix. A frequent misuse of inv arises when solving the system of linear equations Ax = b. One way to solve this is with x = inv(A)*b. A better way, from both an execution time and numerical accuracy standpoint, is to use the matrix division operator x = A\b. This produces the solution using Gaussian elimination, without forming the inverse.

Martin Adámek
  • 16,771
  • 5
  • 45
  • 64
  • Hi Martin, yes I do, in the very beginning of the script `format rat`. Thank you :) I will try your solution, by setting values less that `eps` to `0`. I tried the `format long` in the beginning of my script. Didn't improve the accuracy, still the multiplication was different from `0` – jjepsuomi May 20 '15 at 11:39
  • That is weird, and the result was this time in form of the long fixed-decimal format or still in the ratio form? What is the result now? – Martin Adámek May 20 '15 at 11:49
  • Yes, in the very beginning of my function I have the line: `format long` and the result I get now for the multiplication of `Aq` and `x` is `-5.551115123125783e-17`. P.S. I just now also tried: `Aq*x < eps` which returned `1` – jjepsuomi May 20 '15 at 11:52
  • I would store the result to another variable and compare `abs` value of that to `eps`, otherwise you will make the computation twice... Or if you just want to know if the result is zero, than this approach is fine. – Martin Adámek May 20 '15 at 12:03
  • Sry, I was unavailable for a while ;) – jjepsuomi May 20 '15 at 17:49
  • Replacing the `inv` with `\ ` unfortunately did not help enough :/ It actually gave a little bit worse result than `inv` in the Example 1 case...Thank you anyway =) – jjepsuomi May 21 '15 at 11:07
  • Btw I tried the original code in your question in my Matlab R2014b (64b) and it produces correct result (`0`), so I believe there could be some problem in your Matlab version also (do you use 32b? is upgrading to R2014b an option to you?). – Martin Adámek May 21 '15 at 11:21
  • Oh, nice to hear that =) Yes it is, I will try that :) Thank you!, I have Matlab R2012b, 8.0.0.783, 64-bit – jjepsuomi May 21 '15 at 11:23
  • Hi, In Matlab R2014a I get value closer to zero: `-1.110223024625157e-16`, which is smaller than `eps`, but not exactly 0. I can possibly also try with Matlab R2015a – jjepsuomi May 21 '15 at 12:05
2

With the provided code, this is how I tested:

  1. I added a break-point on the following code:

    if Aq*x ~= 0
        disp('ACTIVE SET CONSTRAINTS Aq :')
        Aq
        disp('CURRENT SOLUTION x :')
        x
        disp('MULTIPLICATION OF Aq and x')
        Aq*x
    end
  2. When the if branch was taken, I typed at console:

    K>> format rat; disp(x);
              12/65
              28/65
              32/65
               8/65
    
    

    K>> disp(x == [12/65; 28/65; 32/65; 8/65]); 0 1 0 0

    K>> format('long'); disp(max(abs(x - [12/65; 28/65; 32/65; 8/65]))); 1.387778780781446e-17

    K>> disp(eps(8/65)); 1.387778780781446e-17

This suggests that this is a displaying problem: the format rat deliberately uses small integers for expressing the value, on the expense of precision. Apparently, the true value of x(4) is the next one to 8/65 than can be possibly put in double format.

So, this begs the question: are you sure that numeric convergence depends on flipping the least significant bit in a double precision value?

  • Hi @CST-Link no I'm not 100% sure. I just started to wonder this because in some examples (with some data set) I get the optimal solution whereas in some other case (with some other data set with lots if decimals in the data points) I don't get the optimal solution with this method. I will do some checking on my optimization procedure nonetheless, Thank you very much! =) The only way for me to be sure would be to do the calculations manually by hand, but that would take a lot of paper and time x) – jjepsuomi May 21 '15 at 13:15
  • You can try this with my code. First do the example 1 from my code. You should get graph with a line separating the data clusters and the line should be as "middle" as possible between the data points. Secondly create a 2D data set using `rand` function with two separate clusters with label values +1 for data points in cluster 1 and label values -1 in cluster 2. Probably in the second case the line won't as "middle" as possible between the data clusters. I made a hypothesis that perhaps the decimals could have something to do with this?...in the first case we have integer values for the data.. – jjepsuomi May 21 '15 at 13:24
  • ...points whereas in the second case the data values can be for example 3.32546325543254543 etc. – jjepsuomi May 21 '15 at 13:25
  • see my post I added another pic, showing how the decision boundary is found with different data set. – jjepsuomi May 21 '15 at 13:36
  • @jjepsuomi I'm going to try a bit later, still at work right now, sorry... :-) –  May 21 '15 at 13:40
  • Of course no problem, take your time. Appreciate your help very much! Thank you :) – jjepsuomi May 21 '15 at 13:41