-6

I've constructed a decision tree that takes every sample equally weighted. Now to construct a decision tree which gives different weights to different samples. Is the only change that I need to make is in finding Expected Entropy before calculating information gain. I'm a little confused how to proceed, plz explain....

For example: Consider a node containing p positive node and n negative nodes.So the nodes entropy will be -p/(p+n)log(p/(p+n)) -n/(p+n)log(n/(p+n)). Now if a split is found somehow dividing the parent node in two child nodes.Suppose the child 1 contains p' positives and n' negatives(so child 2 contains p-p' and n-n').Now for child 1 we will calculate entropy as calculated for parent and take the probability of reaching it i.e. (p'+n')/(p+n). Now expected reduction in entropy will be entropy(parent)-(prob of reaching child1*entropy(child1)+prob of reaching child2*entropy(child2)). And the split with max info gain will be chosen.

Now to do this same procedure when we have weights available for each sample.What changes need to be made? What changes need to be made specifically for adaboost(using stumps only)?

Yank Leo
  • 452
  • 5
  • 19
  • It's straightforward. The only thing you need to compute entropy is expected p_i - portions of events in leaf. Compute them using weights: p_1 = sum of weights of class 1 events / sum of weights of all events (note when all weights are equal, those coincide with simple expected probabilities). – Alleo Apr 18 '16 at 20:16
  • @Alleo Does that mean the weights kind of work like prob of a sample to occur. So if we have m examples initially each sample would have weight 1/m, so the nodes entropy will be (p/m)log(p/m)+((m-p)/m)log((m-p)/m) where p is nos of positive samples in node. Could you plz elaborate to explain the complete methodoly. – Yank Leo Apr 19 '16 at 16:56
  • @Alleo So does weighing comes in calculating current node entropy also or just the expected entropy.... – Yank Leo Apr 19 '16 at 17:11
  • weighting comes in both. Expression is correct. Weights are proportional to probability of sample to happen. Simplest way to check you done everything correctly is multiply all weights by any positive number and check that selected threshold does not change. – Alleo Apr 19 '16 at 23:19
  • @Alleo I'have very well got myself confused could you just take an example an explain. – Yank Leo Apr 20 '16 at 08:34
  • Suppose I have m examples with p positive and (m-p) negative examples. Each example initially weighs 1/m. So the current node entropy will be (p/m)log(p/m)+((m-p)/m)log((m-p)/m). Now after splitting I get k points in left node and J points in right node. Out of K on left I have y positive and K-y neg. So now what would be the entropy and expected entropy of left child. – Yank Leo Apr 20 '16 at 08:40
  • @Alleo According to what I have understood the entropy of left child would be: (y/m)log(y/m)+((k-y)/m)log(k-y/m). is this correct cause i'm not getting the intuition behind this. Also what would be expected entropy... – Yank Leo Apr 20 '16 at 08:43
  • 2
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap May 09 '16 at 15:25
  • @ProcrastinatusMaximus added the needed example. – Yank Leo May 09 '16 at 17:03

1 Answers1

4

(I guess this is the same idea as in some comments, e.g., @Alleo)

Suppose you have p positive examples and n negative examples. Let's denote the weights of examples to be:

a1, a2, ..., ap  ----------  weights of the p positive examples
b1, b2, ..., bn  ----------  weights of the n negative examples

Suppose

a1 + a2 + ... + ap = A 
b1 + b2 + ... + bn = B

As you pointed out, if the examples have unit weights, the entropy would be:

    p          p          n         n
- _____ log (____ )  - ______log(______ )
  p + n      p + n      p + n     p + n

Now you only need to replace p with A and replace n with B and then you can obtain the new instance-weighted entropy.

    A          A          B         B
- _____ log (_____)  - ______log(______ )
  A + B      A + B      A + B     A + B

Note: nothing fancy here. What we did is just to figure out the weighted importance of the group of positive and negative examples. When examples are equally weighted, the importance of positive examples is proportional to the ratio of positive numbers w.r.t number of all examples. When examples are non-equally weighted, we just perform a weighted average to get the importance of positive examples.

Then you follow the same logic to choose the attribute with largest Information Gain by comparing entropy before splitting and after splitting on an attribute.

greeness
  • 15,956
  • 5
  • 50
  • 80