2

I found packages being used to calculating "Information Gain" for selecting main attributes in C4.5 Decision Tree and I tried using them to calculating "Information Gain".

But the results of calculation of each packages are different like the code below.

> IG.CORElearn <- attrEval(In_Occu ~ In_Temp+In_Humi+In_CO2+In_Illu+In_LP+Out_Temp+Out_Humi, dataUSE1, estimator = "InfGain")
> IG.RWeka     <- InfoGainAttributeEval(In_Occu ~ In_Temp+In_Humi+In_CO2+In_Illu+In_LP+Out_Temp+Out_Humi, dataUSE1)
> IG.FSelector <- information.gain(In_Occu ~ In_Temp+In_Humi+In_CO2+In_Illu+In_LP+Out_Temp+Out_Humi,dataUSE1)

    > IG.CORElearn
       In_Temp    In_Humi     In_CO2    In_Illu      In_LP   Out_Temp   Out_Humi 
    0.04472928 0.02705100 0.09305418 0.35064927 0.44299167 0.01832216 0.05551973 
    > IG.RWeka
       In_Temp    In_Humi     In_CO2    In_Illu      In_LP   Out_Temp   Out_Humi 
    0.11964771 0.04340197 0.12266724 0.38963327 0.44299167 0.03831816 0.07705798 
    > IG.FSelector
             attr_importance
    In_Temp       0.08293347
    In_Humi       0.02919697
    In_CO2        0.08411316
    In_Illu       0.27007321
    In_LP         0.30705843
    Out_Temp      0.02656012
    Out_Humi      0.05341252

Why do the results of calculation of each packages be different?

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Archimpressom
  • 41
  • 1
  • 5
  • 1
    This is more of a methodology question than a coding question. The obvious answer is that they use different definitions of "information"-metric. Suggest you choose either the beta Data Science, http://datascience.stackexchange.com, or the established CrossValidated.com sections of stackexchange. – IRTFM Jan 10 '17 at 01:52
  • 1
    we do not have your data and so cannot reproduce your results, but I think that a small step towards an answer is to try in your `FSelector` example add the parameter `unit="log2"` and then compare with the RWeka version – G5W Jan 10 '17 at 02:15

0 Answers0