4

Hi I know someone asked similar issues before but no clear answer yet (or I tried their solution without success: Caret error using GBM, but not without caret Caret train method complains Something is wrong; all the RMSE metric values are missing )

I tried to use caret training methods to predict the categorical outcomes (online data examples below)

library(mlbench)
data(Sonar)
str(Sonar[, 1:10])

library(caret)
set.seed(998)

Sonar$rand<-rnorm(nrow(Sonar))  ##to randomly create the new 3-category outcome
table(Sonar$rand)
Sonar$Class_new<-ifelse(Sonar$Class=="R","R",ifelse(Sonar$rand>0,"M","H"))
table(Sonar$Class_new)

fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           ## repeated ten times
                           repeats = 10)

inTraining <- createDataPartition(Sonar$Class_new, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]

gbmFit1 <- train(Class_new ~ ., data = training,
                 method = "gbm",
                 trControl = fitControl,
                 verbose = FALSE)

Whenever I used the new class variable (Class_new) which has 3 categories, rather than 2 categories in original Class variable, I got the warnings below. It runs fine with 2 category outcome variables. And it is the same case regardless of the train methods (I tried rf, gbm, svm, all the same)

Something is wrong; all the Accuracy metric values are missing:

    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :9     NA's   :9    

Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning messages:
1: In train.default(x, y, weights = w, ...) :
The metric "RMSE" was not in the result set. Accuracy will be used instead.
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

Any help on this is greatly appreciated!

Community
  • 1
  • 1
XIANG JI
  • 61
  • 1
  • 1
  • 4
  • When you reference that you are building on prior questions, ie-"someone tried this before", you should provide a link to those questions in your post. – alexwhitworth Oct 23 '15 at 18:31
  • Possible duplicate of [Caret and KNN in R: predict function gives error](http://stackoverflow.com/questions/33200033/caret-and-knn-in-r-predict-function-gives-error) – phiver Oct 23 '15 at 19:17
  • Also a possible duplicate of [getting this error in Caret](http://stackoverflow.com/questions/30475723/getting-this-error-in-caret) – alexwhitworth Oct 23 '15 at 19:21
  • I edited and put those prior threads links that did not work me.. – XIANG JI Oct 24 '15 at 01:40
  • 1
    Still none of the previous threads helped, including the above 2 posts from Alex and phiver..but thanks – XIANG JI Oct 24 '15 at 18:52

4 Answers4

1

You need to convert the newly created Class_new to a factor, as follows:

Sonar$Class_new<-ifelse(Sonar$Class=="R","R",ifelse(Sonar$rand>0,"M","H"))
Sonar$Class_new <- factor(Sonar$Class_new)

Also, you may want to remove the variables Class and rand from your training and testing data sets. You can do somthing like:

training <- Sonar[ inTraining, !(names(Sonar) %in% c("Class", "rand"))]
testing <- Sonar[-inTraining, !(names(Sonar) %in% c("Class", "rand"))]
howaj
  • 685
  • 5
  • 10
1

I had allowParallel = TRUE in the train function and the machine I was working on did not have multiple cores. After I commented that statement, I did not get the error.

asquare
  • 203
  • 2
  • 7
1

Instead of passing the formula in the train function, pass values for parameters x, y, method etc

the old way:

modFit = train(data.df$Label ~ ., 
                 data = data.df, 
                method = "rpart", 
                trControl= cntr, 
                tuneLength = 7)

new way:

modFit = train(x = data.df.cols, 
                 y = data.df$Label,
                 method = "rpart",
                   trControl = cntrl, 
                   tuneLength = 7)

Note: x = data.df.cols has all columns except the label, data.df.cols = data.df[,2:ncol(data.df)]

0

Thank howaj for your post. That did work for the data I posted but somehow did not work for another dataset, where everything seems to be the same. But I figured out finally:

Could be a syntax issue here. Instead of using train(y~., data=training, ...), I changed to the train(train$y,train$x, ...) without specifying data=.. explicitly:

train(training[,!names(training)%in%response], training$response ...)

This worked.

XIANG JI
  • 61
  • 1
  • 1
  • 4
  • You're welcome. As far as I know, some packages accept only one form (formula interface or x and y interface, but not both). – howaj Oct 25 '15 at 02:00
  • I also found that syntax mattered depending on which `method` was used in `train()`. – Brian D Nov 09 '18 at 17:41