1

I am trying to do knn classification using knncat in R since I have categorical attributes in my data set.

knncat(FinalData, FinalTestData, k=10, classcol = 15)

when i execute the above statement, it gives me the error that : Sets of levels in train and test do not match.

On checking of levels for all of the attributes, i did get a difference. I have a country attribute which can take from 1-41 values in train data set.

However in test data set, one particular country never appears and thus it is causing this error. How am I supposed to deal with that ?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
user3291389
  • 23
  • 10

2 Answers2

2

I'm not sure but you may match the factor levels as below.

train <- factor(c("a","b","c"))
test <- factor(c("a","b"))
levels(test) <- levels(train)
test   
[1] a b
Levels: a b c
Jaehyeon Kim
  • 1,328
  • 11
  • 16
  • 2
    Actually, reassigning levels like that can be dangerous (try with `train <- factor(c("a","b","c")); test <- factor(c("c","b")); test` and see that an "a" magically appears in `test`). Better to use `test<-factor(test, levels=levels(train))` – MrFlick Apr 20 '15 at 19:13
  • Thank you @MrFlick .. it worked perfectly. However, now it gives me the error that some factors have empty levels? Can u please help with how to solve this ?? – user3291389 Apr 20 '15 at 19:25
  • @user3291389 You would need to provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question to help you further – MrFlick Apr 20 '15 at 19:26
0

Perhaps I am wrong, but wouldn't this still be problematic because the KNN algorithm bases its tuning off of Euclidian distance calculations, right? Wouldn't you still need to create a binary variable for each level of your categorical features, which would mean that you would have an issue given that certain levels might not appear in both the training and test sets.

Could someone perhaps enlighten me with regards to this.

Also, as a note, this is meant to be more of a spur than a hijack.

ρяσѕρєя K
  • 132,198
  • 53
  • 198
  • 213