I've been recently working with RPART and ran into a calculation I don't understand.
When working with information gain, how is "improve" or variable importance calculated (they seem to be the same from my tests).
As a dummy example, I tried learning the following table:
happy,class
yes,p
no,n
with the command:
fit <-rpart(class ~ happy,data=train,parms = list(split="information"),minsplit=0)
It's simple, and returns the expected tree with the root and then each leaf containing one element.
Where this gets confusing, is that the improvement given for the split is 1.386294.
I would expect the improvement here to be 1 (going from entropy 1 to entropy 0 in the children), what am I missing?