3

I am trying to calculate the gini index in R. There is no problem to calculate the gini index for a binary decision tree as follows.

gini_process <- function(classes,splitvar = NULL){
  #Assumes Splitvar is a logical vector
  if (is.null(splitvar)){
    base_prob <-table(classes)/length(classes)
    return(1-sum(base_prob**2))
  }
  base_prob <-table(splitvar)/length(splitvar)
  crosstab <- table(classes,splitvar)
  crossprob <- prop.table(crosstab,2)
  No_Node_Gini <- 1-sum(crossprob[,1]**2)
  Yes_Node_Gini <- 1-sum(crossprob[,2]**2)
  return(sum(base_prob * c(No_Node_Gini,Yes_Node_Gini)))
}

Now I want to calculate the gini index for a desicion tree with three nodes (multyway split).
I got the following table:

Car Class
0 0
0 1
1 0
1 0
2 1

Is it possible to calculate the gini index for the column car (with three nodes) in R?
Is it also possible to calculate the gini index for more than three nodes with the same function?

0 Answers0