2

I am using the following function:

kernel = @(X,Y,sigma) exp((-pdist2(X,Y,'euclidean').^2)./(2*sigma^2));

to compute a series of kernels, in the following way:

K = [(1:size(featureVectors,1))', kernel(featureVectors,featureVectors, sigma)];

However, since featureVectors is a huge matrix (something like 10000x10000), it takes really a long time to compute the kernels (e.g., K).

Is it possible to somehow speed up the computation?


EDIT: Context

I am using a classifier via libsvm, with a gaussian kernel, as you may have noticed from the variable names and semantics.

I am using now (more or less) #terms~=10000 and #docs~=10000. This #terms resulted after stopwords removal and stemming. This course indicates that having 10000 features makes sense.

Unfortunately, libsvm does not implement automatically the Gaussian kernel. Thus, it is required to compute it by hand. I took the idea from here, but the kernel computation (as suggested by the referenced question) is really slow.

Community
  • 1
  • 1
Eleanore
  • 1,750
  • 3
  • 16
  • 33
  • Maybe [sparse matrices](http://www.mathworks.de/de/help/matlab/sparse-matrices.html) will help you. – NoDataDumpNoContribution May 28 '14 at 11:31
  • Your feature vectors are really 20000 elements long? That doesn't sound normal. And you really want the pairwise distance between every single feature vector? Certainly not optimal, and not scalable, as you've discovered. I'd say you need to work on your algorithm some more. You're not going to fix this with "micro" optimizations. Feel free to describe the higher level problem for some suggestions. – Peter May 28 '14 at 12:50

2 Answers2

1

You are using pdist2 with two equal input arguments (X and Y are equal when you call kernel). You could save half the time by computing each pair only once. You do that using pdist and then squareform:

kernel = @(X,sigma) exp((-squareform(pdist(X,'euclidean')).^2)./(2*sigma^2));
K = [(1:size(featureVectors,1))', kernel(featureVectors, sigma)];
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
0

Your exponential function will go down very fast. For distances of several sigma your kernel function will essentially be zero. These cases we can sort out and become faster.

function z = kernel(X, Y, sigma)
  d = pdist2(X,Y,'euclidean');
  z = zeros(size(d)); % start with zeros
  m = d < 3 * sigma;
  z(m) = exp(-d(m).^2/(2*sigma^2));
end
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104