0

The variable in question is the most informative one (sorry if the language is bad, Im a newbie), and the tree is 90% accurate (base rate would be around 86%), however I want the algorithm to use more than one attiribute. I constructed a CART tree based on the same data and it used a few more of the available variables and achieved an accuracy of about 92% (all tested on holdout). Is there any way to force the tree to make more splits? Here is the code that I am using:

predictors <- subset(student_train, select = -c(10)) dependant <- as.factor(student_train$g3) c50fit <- C5.0(x = predictors, y = dependant, trials = 10, control = C5.0Control(noGlobalPruning = TRUE))

As you can see I tried some of the stuff I found online but it did not work.

Here's a picture of the output:

the plot of the tree

You can see in the code that I tried out some of the control options but none seemed to work.

Here's the head of the data from the dput() code, I hope I provided it correctly:

structure(list(school = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 2L, 
1L, 1L, 1L), levels = c("GP", "MS"), class = "factor"), address = 
structure(c(2L, 
1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), levels = c("R", "U"), class 
= "factor"), 
mjob = structure(c(1L, 3L, 3L, 3L, 3L, 1L, 4L, 1L, 3L, 3L
), levels = c("at_home", "health", "other", "services", "teacher"
), class = "factor"), fjob = structure(c(4L, 3L, 3L, 3L, 
3L, 3L, 3L, 4L, 3L, 3L), levels = c("at_home", "health", 
"other", "services", "teacher"), class = "factor"), reason = 
structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L), levels = c("course", 
"home", "other", "reputation"), class = "factor"), internet = 
structure(c(2L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("no", "yes"
), class = "factor"), dalc = c(2L, 3L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 2L), walc = c(3L, 3L, 1L, 2L, 3L, 1L, 2L, 4L, 2L, 
4L), g2 = c(10L, 11L, 12L, 14L, 12L, 10L, 11L, 8L, 12L, 11L
), g3 = c("Fail", "Pass", "Pass", "Pass", "Pass", "Pass", 
"Pass", "Fail", "Pass", "Pass")), row.names = c(580L, 545L, 
98L, 378L, 85L, 113L, 645L, 178L, 27L, 384L), class = 
"data.frame")
  • Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by providing your data? – jrcalabrese Dec 14 '22 at 20:15
  • I provided the first ten observations as the dataset is quite large (600 observations), if its needed I can provide a bigger subset – juliusz799 Dec 14 '22 at 20:44

0 Answers0