0

I want to perform a Linear regression on the wine data set while using a 10-fold cross validation. But my class has 3 levels '1' '2' '3'

This is the code so far.

require(boot, quietly = TRUE)
require(caret)

wine_data<-"wine.data"    # has the wine data from https://archive.ics.uci.edu/ml/machine-learning-databases/wine/    


colnames(wine_data)<-c("Class","Alcohol","Malic acid", "Ash", "Alcalinity of ash", "Magnesium","Total phenols", "Flavanoids", "Nonflavanoid phenols","Proanthocyanins","Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline")      

wine_data_lr<- wine_data

wine_data_lr$Class<-as.numeric(wine_data_lr$Class)
wine_data_lr$Magnesium<-as.numeric(wine_data_lr$Magnesium)
wine_data_lr$Proline<-as.numeric(wine_data_lr$Proline)

str(wine_data_lr)

'data.frame':   178 obs. of  14 variables:
 $ Class                       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Alcohol                     : num  14.2 13.2 13.2 14.4 13.2 ...
 $ Malic acid                  : num  1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
 $ Ash                         : num  2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
 $ Alcalinity of ash           : num  15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
 $ Magnesium                   : num  127 100 101 113 118 112 96 121 97 98 ...
 $ Total phenols               : num  2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
 $ Flavanoids                  : num  3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
 $ Nonflavanoid phenols        : num  0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
 $ Proanthocyanins             : num  2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
 $ Color intensity             : num  5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
 $ Hue                         : num  1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
 $ OD280/OD315 of diluted wines: num  3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
 $ Proline                     : num  1065 1050 1185 1480 735 ...

ctrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)

lr_mod_fit <- train(Class ~ .,  data=wine_data_lr, method="glm", family="binomial",trControl = ctrl, tuneLength = 5)

      RMSE        Rsquared  
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 11 warnings (use warnings() to see them)

warnings()

envir, enclos) :
  model fit failed for Fold02: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

3: In eval(expr, envir, enclos) :
  model fit failed for Fold03: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

4: In eval(expr, envir, enclos) :
  model fit failed for Fold04: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

5: In eval(expr, envir, enclos) :
  model fit failed for Fold05: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

6: In eval(expr, envir, enclos) :
  model fit failed for Fold06: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

7: In eval(expr, envir, enclos) :
  model fit failed for Fold07: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

8: In eval(expr, envir, enclos) :
  model fit failed for Fold08: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

9: In eval(expr, envir, enclos) :
  model fit failed for Fold09: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

10: In eval(expr, envir, enclos) :
  model fit failed for Fold10: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

11: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

From the error/warning messages, I figured out that the class levels have to be either 0 or 1.

as the predictor, 'Class' is usually a factor, I did
wine_data_lr$Class<-as.factor(wine_data_lr$Class) and re ran the same code, but got the same set of errors.

and since I'm mentioning family ="binomial" , it could be that the class can have only two possible levels, but my data has three levels, which could be causing the error.So I made family= "multinomial", but I still got the exact same error. how do i address this? Is there a way to convert three levels into two binary levels 0 1.

So far, I have googled and looked up https://github.com/topepo/caret/issues/160
Train function from R caret package error: "Something is wrong; all the Accuracy metric values are missing"
R: Something is wrong; all the Accuracy metric values are missing
getting this error in Caret
"Something is wrong; all the Accuracy metric values are missing" Error in Caret Training
"Something is wrong; all the Accuracy metric values are missing:"

But didn't quite understand how to fix my case.

Any help is appreciated1!

milos.ai
  • 3,882
  • 7
  • 31
  • 33
DJ_Stuffy_K
  • 615
  • 2
  • 11
  • 29

0 Answers0