1

I am using the e1071 'tune' function to optimize an SVM model. I would like to use F1 instead of Accuracy as the value to optimize for. I have found on this post: Optimize F-score in e1071 package that I need to define a new error.fun. The problem that I am having is that the function that is shown in that post was not shown to ultimately be the solution and it does not work for me. If I knew the variable names for the predictions from each iteration of tune I could write a function to calculate F1 but I don't know how to get those values. How can I calculate F1 and use it to optimize model parameters using 'tune' in e1071? My code is as follows:

tuned = tune.svm(PriYN~.,  data = dataset, kernel = "radial", probability=TRUE,  gamma = 10^(-5:-1), cost = 10^(-3:1), tunecontrol=tune.control(cross=10))
Jamie
  • 555
  • 3
  • 14

1 Answers1

0

Using {caret} :

ctrl <- trainControl(method = "repeatedcv", # choose your CV method 
                     number = 5, # according to CV method
                     repeats = 2, # according to CV method
                     summaryFunction = prSummary, # TO TUNE ON F1 SCORE
                     classProbs = T,
                     verboseIter = T
                     #sampling = "smote" # you can try 'smote' resampling method
)

Then tune your model

set.seed(2202)
svm_model <- train(target ~., data = training,
                   method = "svmRadial",
                   #preProcess = c("center", "scale"),
                   tuneLength = 10,
                   metric = "F", # The metric used for tuning is the F1 SCORE
                   trControl = ctrl)
svm_model
tgoronflot
  • 26
  • 5
  • This works pretty good with my data but it does create a problem that I have not figured out how to deal with. My data is 316 variables some of which are non-variable of near non-variable in a few folds. Therefore when I run this code I get problems with "Variable(s) `' constant. Cannot scale data." which is a problem that I do not have with the e1071 implementation. I think e1071 simply ignores the non-variable features while CARET does not. – Jamie Jan 29 '20 at 15:39