2

I try to optimize parameter k in knn using genetic algorithm in r. I tried it using the following code but still receive an error. I used accuracy of the knn based on the selected k value as fitness function. Please help me if you know about knn and genetic algorithm. Here is what i've done.

 library(caret)
 library(GA)
 library(class)

#data import 
tea_jenis_F3 <- read.csv("D:/inggrit/program/F3.csv")
str(tea_jenis_F3)

#to check missing data 
anyNA(tea_jenis_F3)

#data slicing
set.seed(101)
intrain_jenis_F3 <- createDataPartition(tea_jenis_F3$category, p= 0.7, list = FALSE)
training_jenis_F3 <- tea_jenis_F3 [intrain_jenis_F3,]
testing_jenis_F3 <- tea_jenis_F3 [-intrain_jenis_F3,]

#transforming the dependent variable to a factor 
training_jenis_F3[["category"]] = factor(training_jenis_F3[["category"]])

#fitness function
fitness_KNN <- function(chromosome)
{
  # First values in chromosome are 'k' of 'knn' method
  tuneGrid <- data.frame(k=chromosome[1])


  # train control
  train_control <- trainControl(method = "cv",number = 10)

  # train the model
  set.seed(1234)
  model <- train(category ~ ., data= training_jenis_F3, trControl=train_control, 
                 method="knn", tuneGrid=tuneGrid)

  # Extract accuracy statistics
  accuracy_val <- model$results$accuracy

}


GA <- ga(type = "real-valued", fitness = fitness_KNN, lower = -10, upper = 10, monitor = NULL)

error :

Something is wrong; all the Accuracy metric values are missing:
Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)

I would be grateful if you can help me. Thank you

  • Welcome to SO! Please, read [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1), to add a minimal dataset. Also which error do you get and where? – s__ Aug 20 '18 at 12:14
  • @s_t i got error like this Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Mean :NaN Mean :NaN 3rd Qu.: NA 3rd Qu.: NA Max. : NA Max. : NA NA's :1 NA's :1 Error: Stopping In addition: There were 11 warnings (use warnings() to see them) – Inggrit Fauzan Aug 20 '18 at 14:39

1 Answers1

1

I think the problem does not lie in your code, but in the method: Using a genetic algorithm to optimize k in this setting is not possible and also not necessary.

You called ga(type = "real-valued", lower = -10, upper = 10, ...) which means ga will search for the best value between -10 and 10. There are now two problems:

  1. Negative values of k are not possible for knn
  2. ga will produce non-integer values as e.g. 1.234 for k, which are of course also not possible

Fortunately, it is not necessary to use such a complicated method as genetic algorithms in this case. If you want to find the best k in the range [1, 10] just compute the model for each value like this:

k_cands <- 1:10
accuracy <- numeric()

for(k in k_cands) {
  [compute model with k]
  accuracy <- c(accuracy, model$results$accuracy)
}

best_k <- k_cands[which.max(accuracy)]
AEF
  • 5,408
  • 1
  • 16
  • 30