2

I am working with the C50 package in R. This algorithm trains boosted decision trees with a customizable number of trials, and I wish to predict the outcome of each of these trials.

The package has a "predict" algorithm, but it only predicts all the trials, or makes the prediction using the first n trials. However, it doesn't allow for predicting each trial separately.

One way I found to solve this is doing the following:


#Load libraries
library(C50)
library(party)
library(rpart)

#Load data
data(churn)

#Train model with more four trials
set.seed(10)
tree.model <- C5.0(x = churnTrain[, -20], 
             y = churnTrain$churn, 
             trials = 4,
             control = C5.0Control(noGlobalPruning = TRUE,
                                   earlyStopping=FALSE))

#Convert each trial to a separate a class party object
A <- list()
for (i in 1: tree.model$trials["Actual"]){

  A[[i]]<-partykit::as.party(tree.model,trial=i-1)   
}

#Predict the outcome of each separate trial

Z <- list()
for (i in 1: tree.model$trials["Actual"]){
  Z[[i]]<-predict(A[[i]],churnTest[,-20], type = "prob")
  print(Z[[i]][1,])
}

However, there is an alternative way of training the C5.0 trees, given by:

formula2 <- churn ~ .
set.seed(10)
tree.model <- C5.0(formula2, 
             data=churnTrain, 
              trials = 4,
              control = C5.0Control(noGlobalPruning = TRUE,
                                    earlyStopping=FALSE))

When I train the C5.0 trees like this, I can't convert each trial to a separate a class party object:

z1<-partykit::as.party(tree.model,trial=1)
Error in `[.data.frame`(mf, rsp) : undefined columns selected

So, this raises two main questions:

  1. Is there another way to predict the outcome of the individual trials of a C5.0 object in r?
  2. Why does the method I propose work for one formulation of the C5.0 model, but not for the second one? (This same problem, including the same error, also occurs when trying to plot each trial using the package's method "plot").

Thank you

0 Answers0