I am attempting to build a model to predict whether a product will get sold on an ecommerce website with 1 or 0 being the output.
My data is a handful of categorical variables, one with a large amount of levels, a couple binary, and one continuous (the price), with an output variable of 1 or 0, whether or not the product listing got sold.
This is my code:
inTrainingset<-createDataPartition(C$Sale, p=.75, list=FALSE)
CTrain<-C[inTrainingset,]
CTest<-C[-inTrainingset,]
gbmfit<-gbm(Sale~., data=C,distribution="bernoulli",n.trees=5,interaction.depth=7,shrinkage= .01,)
plot(gbmfit)
gbmTune<-train(Sale~.,data=CTrain, method="gbm")
ctrl<-trainControl(method="repeatedcv",repeats=5)
gbmTune<-train(Sale~.,data=CTrain,
method="gbm",
verbose=FALSE,
trControl=ctrl)
ctrl<-trainControl(method="repeatedcv", repeats=5, classProbs=TRUE, summaryFunction = twoClassSummary)
gbmTune<-trainControl(Sale~., data=CTrain,
method="gbm",
metric="ROC",
verbose=FALSE ,
trControl=ctrl)
grid<-expand.grid(.interaction.depth=seq(1,7, by=2), .n.trees=seq(100,300, by=50), .shrinkage=c(.01,.1))
gbmTune<-train(Sale~., data=CTrain,
method="gbm",
metric="ROC",
tunegrid= grid,
verebose=FALSE,
trControl=ctrl)
set.seed(1)
gbmTune <- train(Sale~., data = CTrain,
method = "gbm",
metric = "ROC",
tuneGrid = grid,
verbose = FALSE,
trControl = ctrl)
I am running into two issues. The first is when I attempt add the summaryFunction=twoClasssummary, and then tune I get this:
Error in trainControl(Sale ~ ., data = CTrain, method = "gbm", metric = "ROC", :
unused arguments (data = CTrain, metric = "ROC", trControl = ctrl)
The second problem if I decide bypass the summaryFunction, is when I try and run the model I get this error:
Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels, :
train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
In addition: Warning message:
In train.default(x, y, weights = w, ...) :
cannnot compute class probabilities for regression
I tried changing the output variable from a numeric value of 1 or 0, to just a text value, in excel, but that didn't make a difference.
Any help would be greatly appreciated on how to fix the fact that it's interpreting this model as a regression, or the first error message I am encountering.
Best,
Will will@nubimetrics.com