3

I have a training data set with 28 variables (13 labels and 15 features). A test data set with 15 features and I have to predict labels for this test data set based on the features. I made KNN classifiers for all 13 labels individually.

Is there a possibility of combining all these 13 individual label KNN classifiers into one single multi label classifier?

My current code for single label:

library(class)
train_from_train <- train[1:600,2:16] 
target_a_train_from_train <- train[1:600,17] 
test_from_train <- train[601:800,2:16]
target_a_test_from_train <- train[601:800,17] 
knn_pred_a <-knn (train = train_from_train, test = test_from_train, cl= target_a_train_from_train, k = 29) 
table(knn_pred_a, target_a_test_from_train)
mean(knn_pred_a != target_a_test_from_train) 
knn_pred_a_ON_TEST <-knn (train = train[,2:16], test = test[2:16], cl= train[,17], k = 29) 
knn_pred_a_ON_TEST

I scoured internet and package mldr seems to be an option but I couldn't adapt it to my needs.

lmo
  • 37,904
  • 9
  • 56
  • 69
Abhijeet
  • 43
  • 1
  • 5
  • Can you add the code for your KNN? Indeed the selection of the nearest Neighbours can be combined,purely theoretically speaking. However, http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – CAFEBABE Jan 17 '16 at 23:52
  • @CAFEBABE Apologise for the formatting. And so forth. "a" is the label for first my first KNN classifer. Its presence of bacteria species(0/1) – Abhijeet Jan 18 '16 at 00:25

1 Answers1

2

You can use the package ARNN for this. However, it is not exact as far as I know.

library(RANN)
library(reshape2)

####
## generate some sample data and randomize order
iris.knn <- iris[sample(1:150,150),]
#add a second class
iris.knn["Class2"] <- iris.knn[,5]=="versicolor"
iris.knn$org.row.id <- 1:nrow(iris.knn)
train <- iris.knn[1:100,]
test <- iris.knn[101:150,]
##
#####
## get nearest neighbours
nn.idx <- as.data.frame(nn2(train[1:4],query=test[1:4],k=4)$nn.idx)
## add row id
nn.idx$test.row.id <- test$rowid

#classes and row id
multiclass.vec <- data.frame(row.id=1:150,iris.knn[,5:6])
#1 row per nearest neighbour
melted <-melt(nn.idx,id.vars="row.id")
merged <- merge(melted,multiclass.vec, by.x = "value",by.y="org.row.id")
#aggrgate a single class
aggregate(merged$Species, list(merged$row.id), function(x) names(which.max(table(x))))

 #### aggregate for all classes
 all.classes <- melt(merged[c(2,4,5)],id.vars = "row.id")
 fun.agg <- function(x) {
               if(length(x)==0){
                 ""  #<-- default value adaptation might be needed.
               }else{
                 names(which.max(table(x)))
               }
 }
 dcast(all.classes,row.id~variable, fun.aggregate=fun.agg,fill=NULL)

I did the aggreation only for a single class. Doing this step for all classes in parallel would require another melt operation and would make the code pretty messy.

CAFEBABE
  • 3,983
  • 1
  • 19
  • 38
  • I actually tried doing it with mldr() last night and made some progress. But I'm gonna follow your suggestion too because it makes more sense to me. I have 13 labels, so a follow-up: I can aggregate next 12 labels in parallel with single melt operation? – Abhijeet Jan 18 '16 at 10:52
  • Adapted the code accordingly. However, all columns are then at the end string columns. You might need to adapt the default value in fun.agg. (If this answer helped you it would be nice to up vote and/or accept) – CAFEBABE Jan 18 '16 at 17:46
  • I ended up using mldr to check for validity of my individual knn classifiers. I have put the code [here](http://pastebin.com/HyEqpKWg). – Abhijeet Jan 18 '16 at 23:24
  • in fact I also did classification using SVM and RandomForest and RandomForest seems to give a slightly better result than knn in this case (on my ACTUAL test data). I upvoted the answer but it doesn't accept my vote. – Abhijeet Jan 18 '16 at 23:26
  • Hey, can you please share the same data too or the structure of it – nikki Jul 03 '18 at 10:43