After using OpenCV for boosting I'm trying to implement my own version of the Adaboost
algorithm (check here, here and the original paper for some references).
By reading all the material I've came up with some questions regarding the implementation of the algorithm.
1) It is not clear to me how the weights a_t of each weak learner are assigned.
In all the sources I've pointed out the choice is a_t = k * ln( (1-e_t) / e_t )
, k being a positive constant and e_t the error rate of the particular weak learner.
At page 7 of this source it says that that particular value minimizes a certain convex differentiable function, but I really don't understand the passage.
Can anyone please explain it to me?
2) I have some doubts on the procedure of weight update of the training samples.
Clearly it should be done in such a way to guarantee that they remain a probability distribution. All the references adopt this choice:
D_{t+1}(i) = D_{t}(i) * e^(-a_ty_ih_t(x_i)) / Z_t (where Z_t is a normalization factor chosen so that D_{t+1} is a distribution).
- But why is the particular choice of weight update multiplicative with the exponential of error rate made by the particular weak learner?
- Are there any other updates possible? And if yes is there a proof that this update guarantees some kind of optimality of the learning process?
I hope this is the right place to post this question, if not please redirect me!
Thanks in advance for any help you can provide.