Im trying to stack ensemble of predictions using caretStack and applying LOOCV. Here is my script:
library(readr)
library(caretEnsemble)
# Using wine quality dataset as an example:
raw <- read_delim('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv',
delim = ";", escape_double = FALSE, trim_ws = TRUE)
df<-raw[c(1:10),] # reducing observations to 10 rows
Since LOOCV method is not explicitly offered in the trainControl
function, I have to specify index
and indexOut
arguments. I came up with the following:
holdout<-list()
for(i in 1:nrow(df)){
holdout[[i]]<-i
}
my_control <- trainControl(
savePredictions = 'final',
classProbs = F,
index = rep(list(seq(1,nrow(df))),times=nrow(df)),
indexOut = holdout
)
model_list <- caretList(
quality~.,
data=df,
trControl=my_control,
methodList=c('glm',"gaussprLinear")
)
Here however I get the warning:
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures
And when running caretStack, I get an error:
glm_ensemble <- caretStack(
model_list,
method="glm",
metric="Rsquared",
trControl=my_control
)
Something is wrong; all the Rsquared metric values are missing:
RMSE Rsquared MAE
Min. :0.4614 Min. : NA Min. :0.4614
1st Qu.:0.4614 1st Qu.: NA 1st Qu.:0.4614
Median :0.4614 Median : NA Median :0.4614
Mean :0.4614 Mean :NaN Mean :0.4614
3rd Qu.:0.4614 3rd Qu.: NA 3rd Qu.:0.4614
Max. :0.4614 Max. : NA Max. :0.4614
NA's :1
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
I assume there is smth wrong with the way I set up the index
and index_Out
arguments, but Im not sure. Any help would be appreciated.