I have this set with continued valued attribute Temperature
and boolean valued attribute for Play Tennis
:
Temperature: 40 48 60 72 80 90
Play Tennis: No No Yes Yes Yes No
And this is from a PPT that I was referencing:
I understood each of these steps except the last one, that is the information gain calculation. I understood how the candidate threshold values of 54
and 85
are calculated too. But in the next slide, it says that the information gain for the temp > 54
is chosen as the best one.
But based on my paper calculations on the example, am getting that the temp > 85
should be chosen as it's gain value is higher!
My calculation:
Please excuse for the paper I used. I just do the calculations in any blank paper that I see in my room so as to avoid wastage of paper.
Based on this, I think am somewhere lost. Anyone have any ideas on how they mentioned that temp > 54
is chosen as the best information gain?