1

I am using the RWeka package in R to fit M5' trees to a dataset using "M5P". I then want to convert the tree generated into a "party" tree so that I can access variable importances. The issue I am having is that I can't seem to get the function as.party to work without getting the following error:

"Error: all(sapply(split, head, 1) %in% c("<=", ">")) is not TRUE"

This error only arises when I apply the function within a for loop, but the for loop is necessary as I am running 5-fold cross validation.

Below is the code I have been running:

n <- nrow(data)

k <- 5

indCV <- sample( rep(1:k,each=ceiling(n/k)), n)


for(i in 1:k){

#Training data is for all the observations where indCV is not equal to i

training_data <- data.frame(x[-which(indCV==i),])

training_response <- y[-which(indCV==i)]

#Test the data on the fifth of the data where the observation indices are equal to i

test_data <- x[which(indCV==i),]

test_response <- y[which(indCV==i)]

#Fit a pruned model to the training data

fit <- M5P(training_response~., data=training_data, control=Weka_control(N=TRUE))

#Convert to party

p <- as.party(fit)
}
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • Possible duplicate of [How do you plot a CostSensitiveClassifier tree in R?](https://stackoverflow.com/questions/24420191/how-do-you-plot-a-costsensitiveclassifier-tree-in-r) – Jim G. Feb 22 '18 at 13:21

1 Answers1

2

The RWeka package has an example for converting M5P trees into party objects. If you run example("M5P", package = "RWeka") then the tree visualizations are actually drawn by partykit. After running the examples, see plot(m3) and as.party(m3).

However, while for J48 you can get a fully fledged constparty object, the same is not true for M5P. In the latter case, the tree structure itself can be converted to party but the linear models within the nodes are not completely straightforward to convert into lm objects. Thus, if you want to use the party representation to compute measures that only depend on the tree structure (e.g., variables used for splitting, number of splits, splitpoints, etc.) then you can do so. But if you want to compute measures that depend on the models or the predictions (e.g., mean square errors etc.) then the party class won't be of much help.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49