9

I am quite new to the neural network world so I ask for your understanding. I am generating some tests and thus I have a question about the parameters size and decay. I use the caret package and the method nnet. Example dataset:

require(mlbench)
require(caret)
require (nnet)

data(Sonar)
mydata=Sonar[,1:12] 

set.seed(54878)
ctrl = trainControl(method="cv", number=10,returnResamp = "all")
for_train= createDataPartition(mydata$V12, p=.70, list=FALSE) 
my_train=mydata[for_train,]
my_test=mydata[-for_train,] 

t.grid=expand.grid(size=5,decay=0.2)
mymodel = train(V12~ .,data=my_train,method="nnet",metric="Rsquared",trControl=ctrl,tuneGrid=t.grid) 

So, two are my questions. First, is this the best way with caret to use the nnet method?Second, I have read about the size and the decay (eg. Purpose of decay parameter in nnet function in R?) but I cannot understand how to use them in practice here. Can anyone help?

Community
  • 1
  • 1
les
  • 93
  • 1
  • 1
  • 4

1 Answers1

17

Brief Caret explanation

The Caret package lets you train different models and tuning hyper-parameters using Cross Validation (Hold-Out or K-fold) or Bootstrap.

There are two different ways to tune the hyper-parameters using Caret: Grid Search and Random Search. If you use Grid Search (Brute Force) you need to define the grid for every parameter according to your prior knowledge or you can fix some parameters and iterate on the remain ones. If you use Random Search you need to specify a tuning length (maximum number of iterations) and Caret is going to use random values for hyper-parameters until the stop criteria holds.

No matter what method you choose Caret is going to use each combination of hyper-parameters to train the model and compute performance metrics as follows:

  1. Split the initial Training samples into two different sets: Training and Validation (For bootstrap or Cross validation) and into k sets (For k-fold Cross Validation).

  2. Train the model using the training set and to predict on validation set (For Cross Validation Hold-Out and Bootstrap). Or using k-1 training sets and to predict using the k-th training set (For K-fold Cross Validation).

  3. On the validation set Caret computes some performance metrics as ROC, Accuracy...

  4. Once the Grid Search has finished or the Tune Length is completed Caret uses the performance metrics to select the best model according to the criteria previously defined (You can use ROC, Accuracy, Sensibility, RSquared, RMSE....)

  5. You can create some plot to understand the resampling profile and to pick the best model (Keep in mind performance and complexity)

if you need more information about Caret you can check the Caret web page

Neural Network Training Process using Caret

When you train a neural network (nnet) using Caret you need to specify two hyper-parameters: size and decay. Size is the number of units in hidden layer (nnet fit a single hidden layer neural network) and decay is the regularization parameter to avoid over-fitting. Keep in mind that for each R package the name of the hyper-parameters can change.

An example of training a Neural Network using Caret for classification:

fitControl <- trainControl(method = "repeatedcv", 
                           number = 10, 
                           repeats = 5, 
                           classProbs = TRUE, 
                           summaryFunction = twoClassSummary)

nnetGrid <-  expand.grid(size = seq(from = 1, to = 10, by = 1),
                        decay = seq(from = 0.1, to = 0.5, by = 0.1))

nnetFit <- train(Label ~ ., 
                 data = Training[, ],
                 method = "nnet",
                 metric = "ROC",
                 trControl = fitControl,
                 tuneGrid = nnetGrid,
                 verbose = FALSE)

Finally, you can make some plots to understand the resampling results. The following plot was generated from a GBM training process

GBM Training Process using Caret

Jorge Quintana
  • 801
  • 6
  • 8
  • 2
    I'd try something along the lines of: `nnet_grid <- expand.grid(.decay = c(0.5, 0.1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7), .size = c(3, 5, 10, 20))` then you can try other parameters based on the results. A range between 0.1 and 0.5 for decay is a pretty small parameter space. – marbel Nov 20 '17 at 17:39
  • 1
    Marbel, It is a small parameter space, indeed. The code is a little example of how to train a neural network using caret not a detailed revision of hyper-parameters tuning. – Jorge Quintana Nov 20 '17 at 23:19
  • 1
    You can always edit your answer, I though it was better to leave it as a comment for other people getting here. It's good to see the do's and don't s IMO. – marbel Nov 21 '17 at 23:46