If you think about how the support vector machine might "use" the kernel matrix, you'll see that you can't really do this in the way you're trying (as you've seen :-)
I actually struggled a bit with this when I first was using kernlab + a kernel matrix ... coincidentally, it was also for graph kernels!
Anyway, let's first realize that since the SVM doesn't know how to calculate your kernel function, it needs to have these values already calculated between your new (testing) examples, and the examples it picks out as the support vectors during the training step.
So, you'll need to calculate the kernel matrix for all of your examples together. You'll later train on some and test on the others by removing rows + columns from the kernel matrix when appropriate. Let me show you with code.
We can use the example code in the ksvm
documentation to load our workspace with some data:
library(kernlab)
example(ksvm)
You'll need to hit return a few (2) times in order to let the plots draw, and let the example finish, but you should now have a kernel matrix in your workspace called K
. We'll need to recover the y
vector that it should use for its labels (as it has been trampled over by other code in the example):
y <- matrix(c(rep(1,60),rep(-1,60)))
Now, pick a subset of examples to use for testing
holdout <- sample(1:ncol(K), 10)
From this point on, I'm going to:
- Create a training kernel matrix named
trainK
from the original K
kernel matrix.
- Create an SVM model from my training set
trainK
- Use the support vectors found from the model to create a testing kernel matrix
testK
... this is the weird part. If you look at the code in kernlab
to see how it uses the support vector indices, you'll see why it's being done this way. It might be possible to do this another way, but I didn't see any documentation/examples on predicting with a kernel matrix, so I'm doing it "the hard way" here.
- Use the SVM to predict on these features and report accuracy
Here's the code:
trainK <- as.kernelMatrix(K[-holdout,-holdout]) # 1
m <- ksvm(trainK, y[-holdout], kernel='matrix') # 2
testK <- as.kernelMatrix(K[holdout, -holdout][,SVindex(m), drop=F]) # 3
preds <- predict(m, testK) # 4
sum(sign(preds) == sign(y[holdout])) / length(holdout) # == 1 (perfect!)
That should just about do it. Good luck!
Responses to comment below
what does K[-holdout,-holdout] mean? (what does the "-" mean?)
Imagine you have a vector x
, and you want to retrieve elements 1, 3, and 5 from it, you'd do:
x.sub <- x[c(1,3,5)]
If you want to retrieve everything from x
except elements 1, 3, and 5, you'd do:
x.sub <- x[-c(1,3,5)]
So K[-holdout,-holdout]
returns all of the rows and columns of K
except for the rows we want to holdout.
What are the arguments of your as.kernelMatrix - especially the [,SVindex(m),drop=F] argument (which is particulary strange because it looks like that entire bracket is a matrix index of K?)
Yeah, I inlined two commands into one:
testK <- as.kernelMatrix(K[holdout, -holdout][,SVindex(m), drop=F])
Now that you've trained the model, you want to give it a new kernel matrix with your testing examples. K[holdout,]
would give you only the rows which correspond to the training examples in K
, and all of the columns of K
.
SVindex(m)
gives you the indexes of your support vectors from your original training matrix -- remember, those rows/cols have holdout
removed. So for those column indices to be correct (ie. reference the correct sv column), I must first remove the holdout
columns.
Anyway, perhaps this is more clear:
testK <- K[holdout, -holdout]
testK <- testK[,SVindex(m), drop=FALSE]
Now testK
only has the rows of our testing examples and the columns that correspond to the support vectors. testK[1,1]
will have the value of the kernel function computed between your first testing example, and the first support vector. testK[1,2]
will have the kernel function value between your 1st testing example and the second support vector, etc.
Update (2014-01-30) to answer comment from @wrahool
It's been a while since I've played with this, so the particulars of kernlab::ksvm
are a bit rusty, but in principle this should be correct :-) ... here goes:
what is the point of testK <- K[holdout, -holdout]
- aren't you removing the columns that correspond to the test set?
Yes. The short answer is that if you want to predict
using a kernel matrix, you have to supply the a matrix that is of the dimension rows
by support vectors
. For each row of the matrix (the new example you want to predict on) the values in the columns are simply the value of the kernel matrix evaluated between that example and the support vector.
The call to SVindex(m)
returns the index of the support vectors given in the dimension of the original training data.
So, first doing testK <- K[holdout, -holdout]
gives me a testK
matrix with the rows of the examples I want to predict on, and the columns are from the same examples (dimension) the model was trained on.
I further subset the columns of testK
by SVindex(m)
to only give me the columns which (now) correspond to my support vectors. Had I not done the first [, -holdout]
selection, the indices returned by SVindex(m)
may not correspond to the right examples (unless all N
of your testing examples are the last N
columns of your matrix).
Also, what exactly does the drop = FALSE condition do?
It's a bit of defensive coding to ensure that after the indexing operation is performed, the object that is returned is of the same type as the object that was indexed.
In R, if you index only one dimension of a 2D (or higher(?)) object, you are returned an object of the lower dimension. I don't want to pass a numeric
vector into predict
because it wants to have a matrix
For instance
x <- matrix(rnorm(50), nrow=10)
class(x)
[1] "matrix"
dim(x)
[1] 10 5
y <- x[, 1]
class(y)
[1] "numeric"
dim(y)
NULL
The same will happen with data.frame
s, etc.