15

I am trying to implement an application that uses AdaBoost algorithm. I know that AdaBoost uses set of weak classifiers, but I don't know what these weak classifiers are. Can you explain it to me with an example and tell me if I have to create my own weak classifiers or I'm suppoused to use some kind of algorithm?

gadzix90
  • 744
  • 2
  • 13
  • 28

2 Answers2

22

Weak classifiers (or weak learners) are classifiers which perform only slightly better than a random classifier. These are thus classifiers which have some clue on how to predict the right labels, but not as much as strong classifiers have like, e.g., Naive Bayes, Neurel Networks or SVM.

One of the simplest weak classifiers is the Decision Stump, which is a one-level Decision Tree. It selects a threshold for one feature and splits the data on that threshold. AdaBoost will then train an army of these Decision Stumps which each focus on one part of the characteristics of the data.

Sicco
  • 6,167
  • 5
  • 45
  • 61
  • I decided to use Decision Stumps. My idea for the application is: Recognition what is the type of an elephant. My Elephant class has fields: int size, int weight, double sampleWeight, ElephantType type(which can be Asian or African). I'd like to know if I am suppoused to create only 2 decision stumps(1 for size and 1 for weight) or should I make more decision stumps(few for size and few for weight)? – gadzix90 Aug 24 '12 at 13:31
  • @AjMeen Since a decision stump is by definition only single-level, you can't use two decision stumps one-after-the-other. The best way to solve your problem IMO would be to create a 2d decision stump based on these two distinct features. This way you'll be taking both features into account in the (single) decision stump: Lets say `x=size`, `y=weight`, then your stump would be (for example) a threshold of its 2d-euclidean length: `if sqrt(x^2 + y^2) > 6 then return +1 else return -1`. I chose the condition `> 6` randomly, just to show the point. – Ory Band May 29 '14 at 18:09
  • @AjMeen When I said you can't use more than one decision stump, I meant "..in a single iteration". You should train a single decision stump on every iteration in adaboost. – Ory Band May 29 '14 at 18:48
10

When I used AdaBoost, my weak classifiers were basically thresholds for each data attribute. Those thresholds need to have a performance of more than 50%, if not it would be totally random.

Here is a good presentation about Adaboost and how to calculate those weak classifiers: https://user.ceng.metu.edu.tr/~tcan/ceng734_f1112/Schedule/adaboost.pdf

marc_ferna
  • 5,837
  • 1
  • 18
  • 22