2

Can you help me, please?

I have obtained different results using caret with the same sintaxis. For example:

model = caret::train(y ~., data = train_set, method = 'mlpKerasDecay', preProc = "range", trControl = fitControl)

The ouput: predict(model)

   [9] -1.504384160 -0.327290207  0.167853981 -0.880181074  0.177923009  0.091040477 -0.188434765  0.202333793
  [17]  0.723083436 -0.186078161  0.158884823 -1.461138010  0.164124057  0.161260575  0.060420953 -2.196595907
  [25] -0.450853169 -1.209836602 -1.148020625 -0.028707385  0.272781074 -1.504384160 -0.327290207  0.167853981
  [33] -0.880181074  0.177923009  0.124389857 -0.246523201 -0.188434765  0.202333793  0.723083436 -0.186078161
  [41]  0.158884823 -1.461138010  0.164124057  0.161260575  0.161932811  0.148020968  0.127041399 -1.209836602
  [49] -1.148020625 -0.336488754  0.272781074  0.167853981 -0.880181074  0.177923009  0.124389857  0.091040477 

But with KNN:

model = caret::train(y ~., data = train_set, method = 'knn', preProc = "range", trControl = fitControl) 

The output: predict(model)

  [13] 12.61154 12.36686 12.46844 12.20607 12.42922 12.48749 12.46844 12.46844 12.46844 12.38839 12.42014 12.72458
  [25] 12.51090 12.73310 12.62519 12.37846 12.56763 12.72633 12.53659 12.61154 12.61154 12.61154 12.20607 12.53715
  [37] 12.46844 12.20607 12.42922 12.48749 12.46844 12.46844 12.46844 12.38839 12.55076 12.34583 12.38839 12.73310
  [49] 12.62519 12.67508 12.56763 12.61154 12.61154 12.61154 12.20607 12.36686 12.46844 12.20607 12.42922 12.46844 

As you can see, the order of magnitude is different. My questions:

Why?

What I have to do to reverse the scale in Multilayer Perceptron?

I tried convert_response() (source: preProc = c("center", "scale") meaning in caret's package (R) and min-max normalization) But the results do not seem to have consistency like KNN's result.

Ok, I could create a keras model step-by-step by components, but how can I solve it?

EDIT: An applied example:

Libraries:

library(caret)
library(keras)
library(plyr)
library(recipes)
library(tensorflow)
library(dplyr) 

Set up:

fitControl = trainControl(method = "repeatedcv", number = 5, repeats = 5)
train_set = structure(list(y = c(12.5061772379805, 12.3883942023241, 12.7656884334656, 
12.6760762747759, 12.4292161968444, 12.6115377536383), banos = c(1, 
1, 1, 1, 1, 1), lon = c(-70.65409, -70.6471, -70.64788, -70.64177, 
-70.67638, -70.64213), lat = c(-33.43636, -33.43623, -33.45287, 
-33.44923, -33.43112, -33.44331)), row.names = c(2L, 4L, 7L, 
8L, 10L, 11L), class = "data.frame")

You will receive this warning: Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, :No variation for for: banos. But, this is part of my full data frame dim(train_set) = 8202 63 [yes, I have to clean it (yet)]

Run:

set.seed(1234)
model = caret::train(y ~., data = train_set, method = "mlpKerasDecay", preProc = "range", trControl = fitControl)
predict(model)

Results (maybe will change in your computer):

-0.6769148 -0.7869630 -1.0850035 -1.1153764 -0.2204445 -0.9990849

The problem is here because the range of y (in train_set) is [12,13], but here it seems to be normalized.

Time Elapsed (with i7 10th Intel - RTX 2080):

$everything
   user  system elapsed 
 209.39   13.01  961.49 

$final
   user  system elapsed 
   0.73    0.01    3.86 

Have a nice week!

Kind regards, Mirko.

  • 2
    Provide a minimal reproducible code, visit [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – UseR10085 Oct 19 '20 at 06:49
  • Done @BappaDas. Thanks so much. Have a nice day. Besr regards, Mirko. – Mirko Bozanic Leal Oct 19 '20 at 14:02
  • 1
    If you check the `mlpKerasDecay` model performance you can see its `RMSE` is vastly inferior to knn model. If you check the [source for the model](https://github.com/topepo/caret/blob/master/models/files/mlpKerasDecay.R) you can see this is a very simple keras model with just one layer. Perhaps it can not do better. – missuse Oct 19 '20 at 18:19

1 Answers1

0

Following the advice of @missuse, effectively the keras configuration in caret is deficient. I prefered to take the optimal parameters, and create a network manually. If it serves someone, here is my code (last configuration)

#Normalize data
spec = feature_spec(train_set, y ~ . ) %>% 
  step_numeric_column(all_numeric(), normalizer_fn = scaler_standard()) %>% 
  fit()

layer = layer_dense_features(
  feature_columns = dense_features(spec), 
  dtype = tf$float32
)

#Model
build_model = function() {
  input = layer_input_from_dataset(train_set %>% select(-y))
  
  output = input %>% 
    layer_dense_features(dense_features(spec)) %>% 
    layer_batch_normalization() %>%
    layer_dropout(rate = 0.4) %>%
    layer_dense(units = 128, activation = "relu") %>%
    layer_batch_normalization() %>%
    layer_dropout(rate = 0.3) %>%
    layer_dense(units = 64, activation = "relu") %>%
    layer_batch_normalization() %>%
    layer_dropout(rate = 0.3) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_batch_normalization() %>%
    layer_dropout(rate = 0.2) %>%
    layer_dense(units = 1) 
  
  model = keras_model(input, output)
  
  model %>% 
    compile(
      loss = "mse",
      optimizer = optimizer_rmsprop(),
      metrics = list("mean_absolute_error")
    )
  
  return(model)
}

early_stop = callback_early_stopping(monitor = "val_loss", patience = 20)

model = build_model()

history = model %>% fit(
  x = train_set %>% dplyr::select(-y),
  y = train_set$y,
  epochs = 500,
  validation_split = 0.3,
  verbose = 0,
  callbacks = list(
    callback_early_stopping(patience = 50),
    callback_reduce_lr_on_plateau(factor = 0.05)
    ),
)

#If  you have a friendly data frame, you can see the evolution of error metrics:
plot(history)

#Finally
predicted_trvalues   = as.vector(model %>% predict(train_set %>% select(-y)))
predicted_testvalues = as.vector(model %>% predict(test_set %>% select(-y)))

I obtained a good performance: MAEtrain_set = 0.1326 and MAEtest_set = 0.1331. But... Rsquared < 0 & Rsquared ~ 0 for the two sets. But it seems that it is not "so bad" within this particular case (source: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative)

Thank you so much.

Best regards, Mirko.