1

I have some code where I fit a tree and then automatically prune the tree back by selecting the complexity parameter such that it minimizes the cross validation error, as displayed by the printcp() function. In digesting my console output, I am annoyed by the mass that is printed out by printcp().

What I do is I convert the output of the printcp() function to a dataframe and then use some logic to extract the CP value for the lowest CV error. Is there anyway I can do this, WITHOUT printing the output of printcp to the console?

  df_tree_1 <- rpart(formula(df_lm_2), cp = 0.0001, data = train)
  cp_df <- data.frame(printcp(df_tree_1))
  df_tree_1 <- prune.rpart(tree = df_tree_1, cp = cp_df$CP[which(cp_df$xerror == min(cp_df$xerror))])
goldisfine
  • 4,742
  • 11
  • 59
  • 83
  • The `printcp` function doesn't seem to be in base. If you are using non-standard packages you should include the necessary `library()` calls in the code. Additionally, you should provide sample data to make the problem [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can copy and paste the code into R to re-create the same error you get. That makes it much easier to help you and test any possible solutions. – MrFlick Jul 31 '14 at 17:56

1 Answers1

2

Your rpart()-fitted tree object contains the "cptable" table containing the value you're looking for. The printcp() function just displays this table, so what you really seem to want to do is just return the value dynamically when running prune(). Here's how you could do that:

library(rpart)  # for the rpart function
library(rattle) # for "weather" dataset and for "fancy" tree plotter

# fit model using rpart
fit <- rpart(RainTomorrow ~ Rainfall + Evaporation + Sunshine + WindGustDir, 
             data = weather,
             method = "class")

# visualize with rattle
fancyRpartPlot(fit)

# prune by returning the value in the column of fit$cptable (a table)
# corresponding to the row that has the minimum "xerror" value
fit_autoprune <- prune(tree = fit,
                       cp = fit$cptable[which.min(fit$cptable[, "xerror"]),
                                        "CP"])

# visualize again to see difference
fancyRpartPlot(fit_autoprune)