2

I am using ctree function within party R package. I would like to idenfiy all predictors that are used within the tree in order to reduce the data.frame dimension used for further analyses. For example:

library(ctree)
data(ozone)
myModel<-ctree(Ozone~., data=na.omit(airquality))
plot(myModel)

I would like a function receiving myModel and returning Temp, Wind and Ozone

Giorgio Spedicato
  • 2,413
  • 3
  • 31
  • 45

2 Answers2

4

Just for completeness: The answer by NicE pertains to the ctree() implementation in the party package. If someone wants to do the same thing based on the new (and recommended) implementation in the partykit package, then a different function is necessary because the internal representation completely changed.

getUsefulPredictors <- function(x) {
  varid <- nodeapply(x, ids = nodeids(x),
    FUN = function(n) split_node(n)$varid)
  varid <- unique(unlist(varid))
  names(data_party(x))[varid]
}

This first obtains the variable ID varid from each split in each node of the tree. Then the names of the model frame are obtained and those pertaining to the unique variable IDs returned. In your example:

library("partykit")
myModel <- ctree(Ozone ~ ., data = na.omit(airquality))
getUsefulPredictors(myModel)    
## [1] "Temp" "Wind"
Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
1

You can try using this:

getUsefulPredictors<-function(x){
  flatTree<-unlist(x@tree)
  pred<-unique(flatTree[grepl("*variableName",names(flatTree))])
  return(pred)
}

It flattens the trees and looks for the elements having variableName in their name

Run on your model it returns:

getUsefulPredictors(myModel)
#[1] "Temp" "Wind"
NicE
  • 21,165
  • 3
  • 51
  • 68