4

I am using SKLearn to run SVC on my data.

from sklearn import svm

svc = svm.SVC(kernel='linear', C=C).fit(X, y)

I want to know how I can get the distance of each data point in X from the decision boundary?

user1566200
  • 1,826
  • 4
  • 27
  • 47

2 Answers2

11

For linear kernel, the decision boundary is y = w * x + b, the distance from point x to the decision boundary is y/||w||.

y = svc.decision_function(x)
w_norm = np.linalg.norm(svc.coef_)
dist = y / w_norm

For non-linear kernels, there is no way to get the absolute distance. But you can still use the result of decision_funcion as relative distance.

yangjie
  • 6,619
  • 1
  • 33
  • 40
  • What is the difference between the output of the decision function, and y / w_norm? – user1566200 Aug 18 '15 at 16:16
  • 1
    http://stackoverflow.com/questions/11030253/decision-values-in-libsvm the answer by @karenu would be very helpful. – yangjie Aug 18 '15 at 16:28
  • 1
    The decision value is the evaluation result of w*x+b, y/w_norm is the actual distance. So the closer the decision value is to 0, the closer it is to the decision boundary. And the decision value of it indicates the class of the point. – yangjie Aug 18 '15 at 16:34
  • Shouldn't we include the intercept (`svc.intercept_`)? – Morteza Milani Jun 23 '18 at 09:47
2

It happens to be that I am doing the homework 1 of a course named Machine Learning Techniques. And there happens to be a problem about point's distance to hyperplane even for RBF kernel.

First we know that SVM is to find an "optimal" w for a hyperplane wx + b = 0.

And the fact is that

w = \sum_{i} \alpha_i \phi(x_i)

where those x are so called support vectors and those alpha are coefficient of them. Note that there is a phi() outside the x; it is the transform function that transform x to some high dimension space (for RBF, it is infinite dimension). And we know that

[\phi(x_1)\phi(x_2) = K(x_1, x_2)][2]

so we can compute

enter image description here

enter image description here

then we can get w. So, the distance you want should be

svc.decision_function(x) / w_norm

where w_norm the the norm calculated above.

(StackOverflow doesn't allow me post more than 2 links so render the latex yourself bah.)

shivams
  • 923
  • 12
  • 27
ctinray
  • 21
  • 1