Why do we need a soft-margin if we are using a kernel in SVM?

Question

I know that the idea of using a kernel in SVM is to transform the datapoints into an infinite-dimension space where the points can be linearly separable. This way then we can find a maximum margin that separates the points. But then why do we need to use a soft-margin if we are able to separate all the points?! As far as I know is that the idea behind the soft-margin is that if we are not able to to fully separate all the points then we find the best possible margin. So if we are using the kernel function the entire idea of soft-margin makes no sense to me. So what's the idea?!

not sure about duplication. it's a different way to ask which arrives at the same answers — Nicolas78, Apr 18 '14 at 11:18

score 2 · Answer 1 · answered Apr 17 '14 at 14:41

2

Even if you might be able to create perfectly fitting separating hyperplane, it might be overly complex and thus prone to Overfitting. One of the beauties of the SVM is that the soft margin formulation quite naturally provides the possibility to scale between precision and generalization.

answered Apr 17 '14 at 14:41

Nicolas78

5,124
1
23
41

see this question http://stackoverflow.com/questions/4629505/svm-hard-or-soft-margins for a much more elaborate answer to a not totally identical but related question – Nicolas78 Apr 17 '14 at 14:45

score 1 · Answer 2 · answered Apr 18 '14 at 08:27

You may have duplicates with different labels. Then it's fairly obvious you cannot find a plane that separates your data.

Class A:  (3,3) (2,2) (1,1)
Class B:  (0,0) (1,1) (2,2)

Soft-margin will still yield a reasonable result on a contradictory data set. In reality, data is not well behaved, and the kernel trick cannot always make it linearly separable. Some data is just hard to separate.

Why do we need a soft-margin if we are using a kernel in SVM?

2 Answers2