1

Just trying to practice a logistic regression with a binary outcome. Decided to just try take the Iris dataset without 'versicolor' But when i try to train a model it gives an error: "error: One or more factor levels in the outcome has no data: 'versicolor'"

I don't get it i thought i excluded this! There is something fundamental i clearly don't get. What don't i understand? Thank you.

library(dplyr)
test_iris <- iris %>%
select (everything()) %>%
filter(Species != "versicolor")

myFolds <- createFolds(test_iris, k = 5)
myControl <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE, 
verboseIter = TRUE,
savePredictions = TRUE,
index = myFolds)

library(caret)

model1 <- train(
Species ~.,test_iris,
metric = "ROC",
method = "glm",
family    = binomial,
trControl = myControl
)
khhc
  • 93
  • 2
  • 11
  • 1
    Hi khhc. As pointed by @MrFlick, factors are strange R pets and can have unexpected behavior... See for example [here](http://monashbioinformaticsplatform.github.io/2015-09-28-rbioinformatics-intro-r/01-supp-factors.html) for a tour of the different problems that might arise with them... – Gilles San Martin Mar 02 '18 at 22:01
  • Thanks for that. R can be a real struggle sometimes :-). – khhc Mar 02 '18 at 22:54
  • by the way, should i delete questions when they are marked as duplicates like this? – khhc Mar 02 '18 at 22:56
  • 1
    No !! Your question was clearly asked - with areproducible example !!! - and the answer maybe not so easy to find for beginners. It will point other user in the exact same situation to the other answer pointed by MrFlick. See [here](https://meta.stackoverflow.com/questions/265736/should-i-delete-my-question-if-it-is-marked-as-a-duplicate) for more information. – Gilles San Martin Mar 02 '18 at 23:35

0 Answers0