-1

[Dataset] I tried the titanic question, being a newbie. Just about to train using a dataset and that is where I got stuck:

[data_prepro_maf_train]

all_model<-modelLookup()
classification_model<-all_model%>%filter(forClass==TRUE,!duplicated(model))
class_model<-classification_model$model
set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "final",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
levels(y)<-make.names(levels(factor(data_prepro_maf_train[,12])))
y<-make.names(data_prepro_maf_train[,12],unique=TRUE,allow_=TRUE)
#Train the models
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

I made sure to pick columns with no missing value like "Cabin" and already removed missing values for required columns.

Packages used:

library(caret)
library(caretEnsemble)
library(tidyverse)
library(magrittr)
library(doParallel)
Jabby
  • 1
  • 2
  • Could you provide a sample of your data with `dput(head(df,n))`? – NelsonGon Feb 17 '20 at 02:02
  • Hi NelsonGon. Attached a link to the dataset. – Jabby Feb 17 '20 at 03:20
  • @Jabby Can you provide the object `all_model`? It is missing in your question. So, I am unable to go ahead. What are the libraries have you loaded, please show that also? – UseR10085 Feb 17 '20 at 04:36
  • all_model<-modelLookup() was added. Added the library packages as well – Jabby Feb 17 '20 at 05:01
  • You are using all the classification models available in `caret` package. So, training will take time. You can see [this post](https://stackoverflow.com/questions/51548255/caret-there-were-missing-values-in-resampled-performance-measures) – UseR10085 Feb 17 '20 at 06:38
  • Not quite. In fact in the last line of the code, I specify class_model[1], working only on a model. Yet, it stopped and gave me the error. – Jabby Feb 17 '20 at 22:33

1 Answers1

0

Tried taking on the problem with research and hence the hiatus. The possible solution to my problem could be:

1) One hot encoding: basically a reprocessing method of converting training data to simple factors/numerics

2)Argument input method:

x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

I changed it to y~X1+X2+X3 method and at least now CaretList is running some models [Discussion on formula formula-vs-non-formula-interface-in-train1

Below are the changes made:

#Let’s one hot encode the data_prepro_maf_train data
dummy_model1<-dummyVars(title~.,data=data_prepro_maf_train[c(1,2,3,5,6,7,8,10)])

data_train_mat1<-predict(dummy_model1,newdata=data_prepro_maf_train)

data_prepro_maf_train2<-data.frame(data_train_mat1)

#Add back columns “title” and “Embarked”, which have vital factors for the model
data_prepro_maf_train2<-cbind(data_prepro_maf_train$Embarked,data_prepro_maf_train$title,data_prepro_maf_train2)

colnames(data_prepro_maf_train2)[1]<-"Embarked"
colnames(data_prepro_maf_train2)[2]<-"title"
#Adjust consistency of levels in the new train data. If the error below shows up, try running this code again before running model_list2 (not sure why it is not saved):
"Error: One or more factor levels in the outcome has no data: 'Q'"

levels(data_prepro_maf_train2$Embarked)<-droplevels(data_prepro_maf_train2$Embarked)

set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "all",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
#Since the class_model has over 100 models...let's select a few that we know for testing the previous error (I stumbled upon the “preProcess=c(“center”,”scale”) which said to help in my situation…not sure how it works and would appreciate if someone could explain it??  :
model_list2<-caretList(Embarked~title+Pclass+Age+Sex.male+Sex.female+SibSp+Parch,data=data_prepro_maf_train1,preProcess = c("center", "scale"),trControl = control,metric="Accuracy",methodList = class_model[c(37,52,55,68,102,145,167,189)])

Not confident if this is the end of my problem....at least the model is running and not stopping without any findings

Jabby
  • 1
  • 2