9

I am receiving the following error in R when stacking using the caret package.

"Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to not5, X5sets . Please use factor levels that can be used as valid R variable names (see ?make.names for help)."

The below is the code I am trying to run.

library(caretEnsemble)
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
algorithmList <- c('rpart', 'knn', 'svmRadial')
set.seed(222)
models <- caretList(Tsets ~ MatchSurface + MatchRound + AgeDiff + SameHand + HeightDiff, data=up_sample, trControl=control, methodList=algorithmList)
results <- resamples(models)

When I remove classProbs=TRUE, the code runs but I want to keep this as there is further code I am trying to run after this which requires it. All of my variables are factors or integers and I have changed all classes so they do not have "0"'s and "1"s. Therefore I cant figure out why the code wont run.

I have attached a picture of the data structure below. Would be great if anyone had some advice.

Data Structure

aistow
  • 91
  • 1
  • 1
  • 4
  • 1
    Change the names of levels in `Tsets` column so they do not start with a number. – missuse Jun 25 '18 at 11:43
  • Did you look at `?make.names` like the error message suggests? It explains what is required for a column name to be valid. The error message also says specifically that `"5sets"` will not be a valid column name; run `make.names(c("not5", "5sets"))` to see this for yourself – camille Jun 25 '18 at 14:27

3 Answers3

15

Try changing your target variable to "yes"/"no" instead of 1/0.

blueeyes0710
  • 221
  • 3
  • 6
4

When caretList() runs a tree-based model (here rpart, but also applies to random forests), it converts the factor levels into variables which are used to split the tree. For these variables, names starting with a number are not allowed nor that they contain spaces. So for each of these variables, you can convert the level names to valid labels with the following code.

up_sample %>% 
  mutate(Tsets = factor(Tsets, 
                        labels = make.names(levels(Tsets))))
Agile Bean
  • 6,437
  • 1
  • 45
  • 53
3

you have to change your traincontrol options Try to change the value of

classProbs = F

or you have to change the levels of the output variable to "yes/No" instead of "1/0"

levels(var)=c("Yes","No")
Mario Petrovic
  • 7,500
  • 14
  • 42
  • 62