1

I want to calculate weighted kernels (for using in a SVM classifier) in Matlab but I'm currently compeletely confused.

I would like to implement the following weighted RBF and Sigmoid kernel:

Weighted RBF kernel

Weighted Sigmoid kernel

x and y are vectors of size n, gamma and b are constants and w is a vector of size n with weights.

The problem now is that the fitcsvm method from Matlab need two matrices as input, i.e. K(X,Y). For example the not weighted RBF and sigmoid kernel can be computed as follows:

K_rbf = exp(-gamma .* pdist2(X,Y,'euclidean').^2)
K_sigmoid = tanh(gamma*X*Y' + b);

X and Y are matrices where the rows are the data points (vectors).

How can I compute the above weighted kernels efficiently in Matlab?

machinery
  • 5,972
  • 12
  • 67
  • 118
  • Efficient euclidean distances (pdist2) calculation: [Original source](http://stackoverflow.com/a/23911671/3293881), [Explanation and vectorized variations](http://stackoverflow.com/a/26994722/3293881). – Divakar May 07 '16 at 22:38

1 Answers1

2

Simply scale your input by the weights before passing to the kernel equations. Lets assume you have a vector w of weights (of size of the input problem), you have your data in rows of X, and features are columns. Multiply it with broadcasting over rows (for example using bsxfun) with w. Thats all. Do not do the same to Y though, just multiply one of the matrices. This is true for every such "weighted" kernel based on scalar product (like sigmoid); for distance based (like RBF) you want to scale both by sqrt of w.

Short proofs:

scalar based

f(<wx, y>) = f(w<x, y>) (linearity of scalar product)

distance based

f(||sqrt(w)x - sqrt(w)y||^2) = f(SUM_i (sqrt(w_i)(x_i - y_i))^2) 
                             = f(SUM_i w_i (x_i - y_i)^2)
Community
  • 1
  • 1
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Could you make a small example with bsxfun? I would really appreciate that. Is there a proof why this holds? I don't see it way it holds. – machinery May 07 '16 at 23:34
  • Thank you very much for the proofs. That seems a good way. Would it be possible that you could briefly show some short Matlab code how I can do this weighting efficiently (for scalar and distance based)? – machinery May 08 '16 at 09:31
  • I was able to apply the weights row-wise by using U = bsxfun(@times,w,U); Let's assume I have a training set A ond which I train a SVM. Am I right, that in the distance based case I can apply sqrt(w) to each row of A before applying the SVM instead of applying sqrt(w) to X and Y inside the kernel calculation? – machinery May 11 '16 at 19:30
  • Yes,this is equivalent to preprocessing, Just make sure you do not do any data normalization after – lejlot May 11 '16 at 21:13
  • Alright, but when I do it in this way, I also have to apply sqrt(w) to the test data, right? I have one final question regarding the weights. Should the weights sum to one or should the product of the weights be one? – machinery May 11 '16 at 21:37
  • Yes, you also have to preprocess the test data. In general, you simply add this "module" to your pipeline, like with any preprocessing step. In terms of weights, they can be arbitrary, they do not have to sum to anything. The worst that can happen is that you will need bigger/smaller C in your SVM. I would argue that as a heuristic you might use such weights which sum to the number of features (as this is what happens without weights - efficiently this means w consists only 1's, thus it sums to dimensionality of input space). – lejlot May 11 '16 at 22:46