0

I have this set with continued valued attribute Temperature and boolean valued attribute for Play Tennis:

Temperature:    40    48    60    72    80    90
Play Tennis:    No    No    Yes   Yes   Yes   No

And this is from a PPT that I was referencing: enter image description here

I understood each of these steps except the last one, that is the information gain calculation. I understood how the candidate threshold values of 54 and 85 are calculated too. But in the next slide, it says that the information gain for the temp > 54 is chosen as the best one.

enter image description here

But based on my paper calculations on the example, am getting that the temp > 85 should be chosen as it's gain value is higher!

My calculation:

enter image description here

Please excuse for the paper I used. I just do the calculations in any blank paper that I see in my room so as to avoid wastage of paper.

Based on this, I think am somewhere lost. Anyone have any ideas on how they mentioned that temp > 54 is chosen as the best information gain?

Vpp Man
  • 2,384
  • 8
  • 43
  • 74
  • Temp>54 has more conditions (55,56,57,..) than Temp>85 (only 86,87,...) so it has more freedom levels to choose (such as energy levels) so having less available states mean less information to save since entrphy means useless statechanges or impurities(info=purity?)? – huseyin tugrul buyukisik Jan 16 '15 at 20:09
  • Thank you. So you are saying that if there's more samples then information gain would be higher? Can you please provide a sample calculation too based on the above example? Because am still confused. I feel like am much dumb and not able to sleep(it's 2am here). Because I have been spending several hours on this information gain part – Vpp Man Jan 16 '15 at 20:32
  • Are you sure, you got the concept of information gain? The idea is to compare the entropy before and after inserting an extra node. Your goal is to reduce the entropy (=uncertainty) by inserting splits. I would suggest having a look at this question http://stackoverflow.com/questions/1859554/what-is-entropy-and-information-gain – cel Jan 16 '15 at 20:37

0 Answers0