0

I have to solve a multiclass classification problem in python.

I started to use ensembles and I started from adaboostclassfier but after a gridsearch I get bad results.

What I did is to use the tuned classfier (in the list of classfier that I tried) that shows me the best score as base estimator: a SVC().

Then I did gridsearch on the others parameters of AdaBoostClassfier:

n_estimators: [1,50,100,150]
learning_rate: [0.1,0.4,0.7,1]
algorithm: ['SAMME']

Now I have 3 questions for you:

  1. Why the tuned SVC() shows 82.5% of f1_macro score, and AdaBoostClassfier with only 1 estimator shows 18.6%?
  2. Why with more than 1 estimators I'm unable to improve f1_macro score using AdaBoostClassfier?
  3. Is it possible that boosting makes things worse on my dataset or am I doing something wrong?

This is my code:

def adaBoost_try(train_x, train_y, test_x, test_y):
base_estimator = svm.SVC(C=60, class_weight=None, decision_function_shape='ovo', kernel='rbf', gamma=0.1, random_state=0)
classfier = AdaBoostClassifier()
pipeline = [
    ('scaler', scaler),
    ('reduce_dim', pca),
    ('classfier', classfier)]
best_params = [{
    'scaler':[scaler_quantile],
    'reduce_dim': [pca],
    'reduce_dim__n_components': [15],
    'classfier__base_estimator': [base_estimator],
    'classfier__n_estimators': [1,50,100,150],
    'classfier__learning_rate': [0.1,0.4,0.7,1],
    'classfier__algorithm': ['SAMME'],
    'classfier__random_state': [0]
}]
pipe = Pipeline(pipeline, memory=cachedir)
my_scoring = 'f1_macro'
n_folds = 5
gscv = GridSearchCV(pipe, param_grid=best_params, scoring=my_scoring, n_jobs=-1, cv=n_folds, refit=True)
gscv.fit(train_x, train_y)
print(gscv.best_params_)
print(gscv.best_score_)
print(gscv.score(test_x,test_y))
fabianod
  • 501
  • 4
  • 17

1 Answers1

0

Usually, adopting an ensemble method will outperform a single predictor.

Adaboost, in particular, will fit multiple base classifiers sequentially. The Adaboost algorithm modifies the weights of the examples for each classifier, so that the classifier specializes on (or "give more attention to") the missclassified examples of the previous classifier.

It is a bit unexpected that a single SVC would outperform an Adaboost of SVC.

My main suggestion would be to GridSearch the hyperparameters of the SVC along with the hyperparameters of the AdaBoostClassifier (please check the following reference for details on how to implement: Using GridSearchCV with AdaBoost and DecisionTreeClassifier). Depending on the size of your dataset, adopting many estimators might overfit your data. If you have a small dataset, I would suggest trying in your GridSearch a number of estimators between 1 and 50.

bqbastos
  • 74
  • 3
  • I added my code, as you can see I have done gridsearch, but still i get worse score... the unusual thing is that I get worse score even if I use BaggingClassifier with only 1 estimator... i don't understand why... – fabianod Jul 14 '20 at 09:55