C5.0 number of boosting iterations stops early

Question

I assume this question has more to do with the back-end operations that I don't understand because this behavior seems odd, at least to me.

When I run a C5.0 model with an (albeit extreme) error matrix of:

error_cost <- matrix(c(0, 1, 15, 0), nrow = 2)

and 10 trials I get 10 iterations.

If I do everything the same and up the trials anywhere between 11 and 100 it stops early at 7 iterations, and the output, while "working", is garbage.

If I change the error matrix to:

error_cost <- matrix(c(0, 1, 4, 0), nrow = 2)

and up the iterations to 100 it iterates 100 times (and the results are really good).

Obviously my problem is in the error cost, but I'm just trying to understand why it causes it to behave this way. And while this is a real problem I'm working on, the error costs and iterations are more an attempt to understand what is happening under the hood.

Thoughts?

Thanks in advance.

Full code:

library(C50)

model_data_train$Donated <- as.factor(model_data_train$Donated)
model_data_test$Donated <- as.factor(model_data_test$Donated)

error_cost <- matrix(c(0, 4, 1, 0), nrow = 2)

dt_model10 <- C5.0(model_data_train[-113], model_data_train$Donated, 
                   trials = 100, 
                   rules = TRUE, costs = error_cost)

You haven't supplied any runable code so it's impossible for anyone to reproduce your results to test what might be going on. It's not ever clear what R package and functions you are using. Please edit your question to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — MrFlick, Jun 03 '14 at 14:16
When you add a tag to a question, check the tooltip to make sure it is appropriate in the context of your question. That way it gets exposed to the right people. :) — jbaums, Jun 03 '14 at 14:26
I added as much code as I can. Also, I searched for more specific tags but nothing came up. — Frank B., Jun 03 '14 at 14:34
That's fine - just mentioned it because `boost` refers to the c++ libraries. — jbaums, Jun 03 '14 at 14:37
There's a earlyStopping option which you can set to FALSE if you want to have the exact number of boosting iterations. — Kenston Choi, Jun 05 '14 at 03:37

AidanGawronski · Answer 1 · 2016-03-17T03:05:35.000

0

If you read deeper in the library documentation there is a control feature called earlyStopping which you can toggle off:

dt_model10 <- C5.0(model_data_train[-113], model_data_train$Donated, 
               trials = 100, 
               rules = TRUE, 
               costs = error_cost,
               control=C5.0Control(earlyStopping=FALSE))

as mentioned by @Kenston Choi

edited Mar 17 '16 at 03:05

answered Mar 17 '16 at 02:56

AidanGawronski

2,055
1
14
24

C5.0 number of boosting iterations stops early

1 Answers1

Linked