0

I was reading A Practical Guide to Support Vector Classification by Chih-Wei Hsu to try and make my SVM and decision tree run faster and he mentioned that scaling the data before performing SVM is important. I have dataset with 25 columns and one columns is type factor when I tried to scale the data I got error saying that column x must be numeric, when I converted the factor column into numeric the scale function worked.

Will converting categorical variable to numeric and scaling it affect my result negatively?

nullUser
  • 11
  • 3
  • Hi, welcome to SO. SO is intended for coding-related questions. So you have better chances of getting interesting answers if you include a code. But the question appears not to be code-treated indeed. The most suited forum for it would be cross-validated. But a note: the snippet you are mentioning is likely referring to cases in which you have some quantitative variables. I have no idea what would mean to scale categorical (nominal) variables... There is no meaning in converting (recoding) as numeric and scaling it... – hamagust Jun 03 '22 at 14:36

1 Answers1

0

It is not a good idea to scale categorical variables, The best practice with it is to use encoding with one-hot-encoding method which returns 0's or 1's for each category .

Mohamed Desouky
  • 4,340
  • 2
  • 4
  • 19
  • my category is 2 level , when I imported my data the type was character 1, 0 I converted it to factor it refused to perform scale. my data have 17 columns with values range 50-50k, my responses variable is categorical type character 1,0 and 4 columns are range 1-3 for age, marriage,sex , education and the other 4 columns range form -2 to 9 they represent history of payment. Should I ignore the categorical variable and only scale the 17 columns plus the other 8 columns OR scale all the columns and perform one-hot encoding for the categorical variable? – nullUser Jun 03 '22 at 09:55
  • Would you `dput` the head of your data ? – Mohamed Desouky Jun 03 '22 at 09:58
  • it's a lot I can't copy all the output is it ok if I use str() or summary ? – nullUser Jun 03 '22 at 10:11
  • Just copy ‘dput(head(your data))’ – Mohamed Desouky Jun 03 '22 at 10:37
  • Read this it will help [hot encoding](https://stackoverflow.com/questions/48649443/how-to-one-hot-encode-several-categorical-variables-in-r) – Mohamed Desouky Jun 03 '22 at 10:47