1

What is the proper way to normalize feature vectors for use in a linear-kernel SVM?

Looking at LIBSVM, it looks like it's done by just rescaling each feature to a single standard upper/lower range. However, it doesn't seem like PyML provides a way to scale the data this way. Instead, there are options to normalize the vectors by their length, shift each feature value by its mean while rescaling by the standard deviation, etc.

I am dealing with a case when most features are binary, except a few that are numeric.

agrin
  • 11
  • 1

1 Answers1

0

I am not an expert in this, but I believe centering and scaling each feature vector by subtracting its mean and dividing thereafter by the standard deviation is a typical way to normalize feature vectors for use with SVMs. In R, this can be done with the scale function.

Another way is to transform each feature vector to the [0,1] range:

(x - min(x)) / (max(x) - min(x))

Maybe some features could benefit from a log-transformation if the distribution is very scewed, but this would change the shape of the distribution as well and not only "move" it.

I am not sure what you gain in an SVM-setting by normalizing the vectors by their L1 or L2 norm like PyML does with its normalize method. I guess binary features (0 or 1) don't need to be normalized.

Tony
  • 542
  • 1
  • 4
  • 6
  • v=[stats.contr,stats.corrm,stats.energ,stats.entro,stats.homom]; o=(v - min(v)) / (max(v) - min(v)); I tried this code. But the values are still not in the range [0,1] . Is there any wrong in my coding sir? – Gomathi Mar 22 '12 at 16:00
  • I don't understand what your first statement is supposed to do. v is supposed to be a vector of numbers holding values for a feature. – Tony Apr 10 '12 at 11:57