I'm doing quite simple SVM classification at the moment. I use a precomputed kernel in LibSVM with RBF and DTW.
When I compute the similarity (kernel-) matrix, everything seems to work very fine ... until I permute my data, before I compute the kernel matrix.
An SVM is of course invariant to permutations of input-data. In the below Matlab-code, the line marked with '<- !!!!!!!!!!' decides about the classification accuracy (not permuted: 100% -- permuted: 0% to 100%, dependant on the seed of rng). But why does permuting the file-string-array (named fileList) make any difference? What am I doing wrong? Have I misunderstood the concept of 'permutation invariance' or is it a problem with my Matlab-code?
My csv-files are formatted as: LABEL, val1, val2, ..., valN and all the csv-files are stored in the folder dirName. So the string array contains the entries '10_0.csv 10_1.csv .... 11_7.csv, 11_8.csv' (not permuted) or some other order when permuted.
I also tried to permute the vector of sample serial numbers, too, but that makes no difference.
function [SimilarityMatrixTrain, SimilarityMatrixTest, trainLabels, testLabels, PermSimilarityMatrixTrain, PermSimilarityMatrixTest, permTrainLabels, permTestLabels] = computeDistanceMatrix(dirName, verificationClass, trainFrac)
fileList = getAllFiles(dirName);
fileList = fileList(1:36);
trainLabels = [];
testLabels = [];
trainFiles = {};
testFiles = {};
permTrainLabels = [];
permTestLabels = [];
permTrainFiles = {};
permTestFiles = {};
n = 0;
sigma = 0.01;
trainFiles = fileList(1:2:end);
testFiles = fileList(2:2:end);
rng(3);
permTrain = randperm(length(trainFiles))
%rng(3); <- !!!!!!!!!!!
permTest = randperm(length(testFiles));
permTrainFiles = trainFiles(permTrain)
permTestFiles = testFiles(permTest);
noTrain = size(trainFiles);
noTest = size(testFiles);
SimilarityMatrixTrain = eye(noTrain);
PermSimilarityMatrixTrain = (noTrain);
SimilarityMatrixTest = eye(noTest);
PermSimilarityMatrixTest = eye(noTest);
% UNPERM
%Train
for i = 1 : noTrain
x = csvread(trainFiles{i});
label = x(1);
trainLabels = [trainLabels, label];
for j = 1 : noTrain
y = csvread(trainFiles{j});
dtwDistance = dtwWrapper(x(2:end), y(2:end));
rbfValue = exp((dtwDistance.^2)./(-2*sigma));
SimilarityMatrixTrain(i, j) = rbfValue;
n=n+1
end
end
SimilarityMatrixTrain = [(1:size(SimilarityMatrixTrain, 1))', SimilarityMatrixTrain];
%Test
for i = 1 : noTest
x = csvread(testFiles{i});
label = x(1);
testLabels = [testLabels, label];
for j = 1 : noTest
y = csvread(testFiles{j});
dtwDistance = dtwWrapper(x(2:end), y(2:end));
rbfValue = exp((dtwDistance.^2)./(-2*sigma));
SimilarityMatrixTest(i, j) = rbfValue;
n=n+1
end
end
SimilarityMatrixTest = [(1:size(SimilarityMatrixTest, 1))', SimilarityMatrixTest];
% PERM
%Train
for i = 1 : noTrain
x = csvread(permTrainFiles{i});
label = x(1);
permTrainLabels = [permTrainLabels, label];
for j = 1 : noTrain
y = csvread(permTrainFiles{j});
dtwDistance = dtwWrapper(x(2:end), y(2:end));
rbfValue = exp((dtwDistance.^2)./(-2*sigma));
PermSimilarityMatrixTrain(i, j) = rbfValue;
n=n+1
end
end
PermSimilarityMatrixTrain = [(1:size(PermSimilarityMatrixTrain, 1))', PermSimilarityMatrixTrain];
%Test
for i = 1 : noTest
x = csvread(permTestFiles{i});
label = x(1);
permTestLabels = [permTestLabels, label];
for j = 1 : noTest
y = csvread(permTestFiles{j});
dtwDistance = dtwWrapper(x(2:end), y(2:end));
rbfValue = exp((dtwDistance.^2)./(-2*sigma));
PermSimilarityMatrixTest(i, j) = rbfValue;
n=n+1
end
end
PermSimilarityMatrixTest = [(1:size(PermSimilarityMatrixTest, 1))', PermSimilarityMatrixTest];
mdlU = svmtrain(trainLabels', SimilarityMatrixTrain, '-t 4 -c 0.5');
mdlP = svmtrain(permTrainLabels', PermSimilarityMatrixTrain, '-t 4 -c 0.5');
[pclassU, xU, yU] = svmpredict(testLabels', SimilarityMatrixTest, mdlU);
[pclassP, xP, yP] = svmpredict(permTestLabels', PermSimilarityMatrixTest, mdlP);
xU
xP
end
I'd be very thankful for any answer!
Regards Benjamin