2

I am using caret to tune an MLP in a 10-fold CV (repeated 5 times). I would like to obtain the prSummary (F1, Precision, Recall) as well as the standard accuracy and kappa scores in the summary output.

  • With the caret::defaultSummary() I get the desired Accuracy and Kappa Values however it's missing F1, precision and recall.
  • With prSummary() function is used the opposite is true: missing Kappa and Accuracy.
  • Is there a way to get both metrics at once? I provided a toy example with the iris dataset and removed one class to get a binary classification problem.

Q2) On a side note: is it advisable to use the seeds parameter as I did, for reproducibility of Cross-Validation? Because with random sampling seeds my code is probably still not reproducible right?

########################## Info ############################
# Toy Example - F1, Precision & Recall averaged over folds
#
########################## Preparation & Libraries ############################
#load libraries
library("dplyr")
library("ggplot2")
library("mlbench")    # for hyperparameter tuning
library("caret")      # for hyperparameter tuning
library("tictoc")     # for a performance measure

df1 <- iris %>% rename(
  Class = Species 
) %>% subset(., Class == "versicolor" | Class == "setosa")
df1$Class <- factor(df1$Class)

########################## Caret Preparation ############################
k.folds = 10
df1.seeds = c(rep(list(sample(1:10000,4,replace = T)),50),
              sample(1:10000,1,replace = T))
df1.control <- trainControl( # 10 Fold cross validation, repeated 5 times
  method="repeatedcv", 
  number=k.folds, 
  repeats=5,
  classProbs = T,
  seeds = df1.seeds,
  # summaryFunction=prSummary,
  # summaryFunction=prSummary defaultSummary twoClassSummary,
  summaryFunction=prSummary,
  #savePredictions=T
)

########################## Hyperparametertuning NeuralNet (MLP) ############################
df1.tunegrid <- expand.grid(.size=c(1:(ncol(df1)-1)))
metric <- "Accuracy"

set.seed(1337)
tic("MLP DF1, Hyperparameter Startegy 1: Grid Search")
mlp_df1 <- train(Class~., data=df1, method="mlp", metric=metric, tuneGrid=df1.tunegrid, trControl=df1.control)
toc()
print(mlp_df1)
# plot(mlp_df1)
print(mlp_df1$bestTune)
smci
  • 32,567
  • 20
  • 113
  • 146
Björn
  • 1,610
  • 2
  • 17
  • 37
  • 1
    for reproducibility it would be best to use predefined folds created with one of the functions described here https://www.rdocumentation.org/packages/caret/versions/6.0-84/topics/createDataPartition. If you need additional explanations post a comment and I will provide them in an answer. – missuse Jul 24 '19 at 20:14
  • @missuse Thanks a lot missuse. I voted ur answer aswell as the other question up. That was exactly what I was looking for. Since this is a toy example I didnt implement a proper split into folds. For my actual data I am using groupKFold() from caret (since I have dependent sample, multiple trials from the same person). – Björn Jul 25 '19 at 07:18
  • Do you have an opinion to my use of seeds? Should I drop this line of code? – Björn Jul 25 '19 at 07:21
  • Tagged [tag:cross-validation],[tag:k-fold]. Best not to ask two questions in one; ask separate questions. – smci Jul 29 '19 at 01:07

0 Answers0