I want to perform a Linear regression on the wine data set while using a 10-fold cross validation. But my class has 3 levels '1' '2' '3'
This is the code so far.
require(boot, quietly = TRUE)
require(caret)
wine_data<-"wine.data" # has the wine data from https://archive.ics.uci.edu/ml/machine-learning-databases/wine/
colnames(wine_data)<-c("Class","Alcohol","Malic acid", "Ash", "Alcalinity of ash", "Magnesium","Total phenols", "Flavanoids", "Nonflavanoid phenols","Proanthocyanins","Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline")
wine_data_lr<- wine_data
wine_data_lr$Class<-as.numeric(wine_data_lr$Class)
wine_data_lr$Magnesium<-as.numeric(wine_data_lr$Magnesium)
wine_data_lr$Proline<-as.numeric(wine_data_lr$Proline)
str(wine_data_lr)
'data.frame': 178 obs. of 14 variables:
$ Class : num 1 1 1 1 1 1 1 1 1 1 ...
$ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ...
$ Malic acid : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
$ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
$ Alcalinity of ash : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
$ Magnesium : num 127 100 101 113 118 112 96 121 97 98 ...
$ Total phenols : num 2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
$ Flavanoids : num 3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
$ Nonflavanoid phenols : num 0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
$ Proanthocyanins : num 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
$ Color intensity : num 5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
$ Hue : num 1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
$ OD280/OD315 of diluted wines: num 3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
$ Proline : num 1065 1050 1185 1480 735 ...
ctrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
lr_mod_fit <- train(Class ~ ., data=wine_data_lr, method="glm", family="binomial",trControl = ctrl, tuneLength = 5)
RMSE Rsquared
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 11 warnings (use warnings() to see them)
warnings()
envir, enclos) :
model fit failed for Fold02: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
3: In eval(expr, envir, enclos) :
model fit failed for Fold03: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
4: In eval(expr, envir, enclos) :
model fit failed for Fold04: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
5: In eval(expr, envir, enclos) :
model fit failed for Fold05: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
6: In eval(expr, envir, enclos) :
model fit failed for Fold06: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
7: In eval(expr, envir, enclos) :
model fit failed for Fold07: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
8: In eval(expr, envir, enclos) :
model fit failed for Fold08: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
9: In eval(expr, envir, enclos) :
model fit failed for Fold09: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
10: In eval(expr, envir, enclos) :
model fit failed for Fold10: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
11: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :
There were missing values in resampled performance measures.
From the error/warning messages, I figured out that the class levels have to be either 0 or 1.
as the predictor, 'Class' is usually a factor, I did
wine_data_lr$Class<-as.factor(wine_data_lr$Class)
and re ran the same code, but got the same set of errors.
and since I'm mentioning family ="binomial" , it could be that the class can have only two possible levels, but my data has three levels, which could be causing the error.So I made family= "multinomial", but I still got the exact same error. how do i address this? Is there a way to convert three levels into two binary levels 0 1.
So far, I have googled and looked up
https://github.com/topepo/caret/issues/160
Train function from R caret package error: "Something is wrong; all the Accuracy metric values are missing"
R: Something is wrong; all the Accuracy metric values are missing
getting this error in Caret
"Something is wrong; all the Accuracy metric values are missing" Error in Caret Training
"Something is wrong; all the Accuracy metric values are missing:"
But didn't quite understand how to fix my case.
Any help is appreciated1!