0

I have two multi-class data sets with 5 labels, one for training, and the other for cross validation. These data sets are stored as .csv files, so they act as a control in this experiment.

I have a C++ wrapper for libsvm, and the MATLAB functions for libsvm.

For both C++ and MATLAB: Using a C-type SVM with an RBF kernel, I iterate over 2 lists of C and Gamma values. For each parameter combination, I train on the training data set and then predict the cross validation data set. I store the accuracy of the prediction in a 2D map which correlates to the C and Gamma value which yielded the accuracy.

I've recreated different training and cross validation data sets many, many times. Each time, the C++ and MATLAB accuracies are different; sometimes by a lot! Mostly MATLAB produces higher accuracies, but sometimes the C++ implementation is better.

What could be accounting for these differences? The C/Gamma values I'm trying are the same, as are the remaining SVM parameters (default).

lejlot
  • 64,777
  • 8
  • 131
  • 164
trianta2
  • 3,952
  • 5
  • 36
  • 52

2 Answers2

4

There should be no significant differences as both C and Matlab codes use the same svm.c file. So what can be the reason?

  • implementation error in your code(s), this is unfortunately the most probable one
  • used wrapper has some bug and/or use other version of libsvm then your matlab code (libsvm is written in pure C and comes with python, Matlab and java wrappers, so your C++ wrapper is "not official") or your wrapper assumes some additional default values, which are not default in C/Matlab/Python/Java implementations
  • you perform cross validation in somewhat randomized form (shuffling the data and then folding, which is completely correct and reasonable, but will lead to different results in two different runs)
  • There is some rounding/conversion performed during loading data from .csv in one (or both) of your codes which leads to inconsistencies (really not likely to happen, yet still possible)
lejlot
  • 64,777
  • 8
  • 131
  • 164
0

I trained an SVC using scikit-Learn (sklearn.svm.SVC) within a python Jupiter Notebook. I wanted to use the trained classifier in MATLAB v. 2022a and C++. I nedeed to verify that all three versions' predictions matched for each implementation of the kernel, decision, and prediction functions. I found some useful guidance from bcorso's implementation of the original libsvm C++ code.

Exporting structure that represents the structure's model is explained in bcorso's post ab required to call his prediction function implementation:

    predict(params, sv, nv, a, b, cs, X)

for it to match sklearn's version for trained classifier instance, clf:

    clf.predict(X)

Once I established this match, I created a MATLAB versions of bcorso's kernel,

function [k] = kernel_svm(params, sv, X)
    k = zeros(1,length(sv));
    if strcmp(params.kernel,'linear')
        for i = 1:length(sv)
            k(i) = dot(sv(i,:),X);
        end
    elseif  strcmp(params.kernel,'rbf') 
        for i = 1:length(sv)
            k(i) =exp(-params.gamma*dot(sv(i,:)-X,sv(i,:)-X));
        end
    else 
        uiwait(msgbox('kernel not defined','Error','modal'));
    end
    k = k'; 
    end

decision,

function [d] = decision_svm(params, sv, nv, a, b, X)
    %% calculate the kernels
    kvalue = kernel_svm(params, sv, X);

    %% define the start and end index for support vectors for each class
    nr_class = length(nv);
    start = zeros(1,nr_class);
    start(1) = 1;
    %% First Class Loop
    for i = 1:(nr_class-1)
        start(i+1) = start(i)+ nv(i)-1;
    end 
    %% Other Classes Nested Loops
    for i = 1:nr_class
        for j = i+1:nr_class
                sum = 0;
                si = start(i); %first class start
                sj = start(j); %first class end
                ci = nv(i)+1;  %next class start
                cj = ci+ nv(j)-1; %next class end
                for k = si:sj
                    sum =sum + a(k) * kvalue(k);
                end    
                sum1=sum;
                sum = 0;
                for k = ci:cj
                    sum = sum + a(k) * kvalue(k);
                end  
                sum2=sum;
         end
    end 
    %% Add class sums and  the intercept
    sumd = sum1 + sum2;
    d = -(sumd +b);
end

and predict functions.

function [class, classIndex] = predict_svm(params, sv, nv, a, b, cs, X)
    dec_value = decision_svm(params, sv, nv, a, b, X);
    if dec_value <= 0
        class = cs(1);
        classIndex = 1;
    else
        class = cs(2);
        classIndex = 0;
    end
end

Translation of the python comprehension syntax to a MATLAB/C++ equivalent of the summations required nested for loops in the decision function.

It is also required to account for for MATLAB indexing (base 1) vs.Python/C++ indexing (base 0).

The trained classifer model is conveyed by params, sv, nv, a, b, cs, which can be gathered within a structure after hanving exported the sv and a matrices as .csv files from teh python notebook. I simply created a wrapper MATLAB function svcInfo that builds the structure:

svcStruct = svcInfo();
params = svcStruct.params;
sv= svcStruct.sv;
nv = svcStruct.nv;
a = svcStruct.a;
b = svcStruct.b;
cs = svcStruct.cs;

Or one can save the structure contents within as MATLAB workspace within a .mat file. The new case for prediction is provided as a vector X,

%Classifier input feature vector
X=[x1 x2...xn];

A simplified C++ implementation that follows bcorso's python version is fairly similar to this MATLAB implementation in that it uses the nested "for" loop within the decision function but it uses zero based indexing.

Once tested, I may expand this post with the C++ version on the MATLAB code shared above.

DanGitR
  • 47
  • 1
  • 8