1

Normally we use GridSearchCV for performing grid search on hyperparameters of one particular model, like for example:

model_ada = AdaBoostClassifier()
params_ada = {'n_estimators':[10,20,30,50,100,500,1000], 'learning_rate':[0.5,1,2,5,10]}
grid_ada = GridSearchCV(estimator = model_ada, param_grid = params_ada, scoring = 'accuracy', cv = 5, verbose = 1, n_jobs = -1)
grid_ada.fit(X_train, y_train)

Is there any technique or function which allows us to perform grid search on ML models themselves? For example, I want to do as given below:

models = {'model_gbm':GradientBoostingClassifier(), 'model_rf':RandomForestClassifier(), 'model_dt':DecisionTreeClassifier(), 'model_svm':SVC(), 'model_ada':AdaBoostClassifier()}
params_gbm = {'learning_rate':[0.1,0.2,0.3,0.4], 'n_estimators':[50,100,500,1000,2000]}
params_rf = {'n_estimators':[50,100,500,1000,2000]}
params_dt = {'splitter':['best','random'], 'max_depth':[1, 5, 10, 50, 100]}
params_svm = {'C':[1,2,5,10,50,100,500], 'kernel':['rbf','poly','sigmoid','linear']}
params_ada = {'n_estimators':[10,20,30,50,100,500,1000], 'learning_rate':[0.5,1,2,5,10]}
params = {'params_gbm':params_gbm, 'params_rf':params_rf, 'params_dt':params_dt, 'params_svm':params_svm, 'params_ada':params_ada}
grid_ml = "that function"(models = models, params = params)
grid_ml.fit(X_train, y_train)

where "that function" is the function which I need to use to perform this type of operation.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mujeebur Rahman
  • 189
  • 1
  • 14

2 Answers2

2

Even I faced a similar issue, but couldn't find a predefined package/method that could possibly achieve this. Hence I wrote my own function to achieve this :

    def Algo_search(models , params):

       max_score = 0
       max_model = None
       max_model_params = None

       for i,j in zip(models.keys() , models.values() ):

            gs = GridSearchCV(estimator=j,param_grid=params[i])
            a = gs.fit(X_train,y_train)
            score = gs.score(X_test,y_test)

            if score > max_score:
                max_score = score
                max_model = gs.best_estimator_
                max_model_params = gs.best_params_

       return max_score, max_model, max_model_params

      #Data points
    models = {'model_gbm':GradientBoostingClassifier(), 'model_rf':RandomForestClassifier(), 
      'model_dt':DecisionTreeClassifier(), 'model_svm':SVC(), 'model_ada':AdaBoostClassifier()}
   params_gbm = {'learning_rate':[0.1,0.2,0.3,0.4], 'n_estimators':[50,100,500,1000,2000]}
   params_rf = {'n_estimators':[50,100,500,1000,2000]}
   params_dt = {'splitter':['best','random'], 'max_depth':[1, 5, 10, 50, 100]}
   params_svm = {'C':[1,2,5,10,50,100,500], 'kernel':['rbf','poly','sigmoid','linear']}
   params_ada = {'n_estimators':[10,20,30,50,100,500,1000], 'learning_rate':[0.5,1,2,5,10]}
   params = {'model_gbm':params_gbm, 'model_rf':params_rf, 'model_dt':params_dt, 'model_svm':params_svm, 'model_ada':params_ada}
   grid_ml = Algo_search(models = models, params = params)
Sahil_Angra
  • 131
  • 7
  • 1
    @MujeeburRahman Bro I think you just picked the shorter answer. Mine tackled the problem in detail so you know the consequences of your implementation. You clearly here missed my point about the usage of `OrderedDict`, as this answer will not work across most of the Python Implementations (even if it works now on your current Python version). Look [here](https://stackoverflow.com/a/39980744/6365112) to understand why. – Yahya Dec 28 '20 at 13:05
  • You are right bro.. Sorry not to mention earlier, but I liked both the solutions. I use both of these solutions in different places. Unfortunately I cannot accept two answers, so I accepted the answer which suited my requirement at that situation. Currently I am using your solution for a different scenario. Thanks for your effort in giving me the solution... – Mujeebur Rahman Apr 23 '21 at 06:00
2

It should be straightforward to perform multiple GridSearchCV then compare the results.

Below is a complete example on how to achieve this.

Note that there is a room for improvement, I will leave it to you. However, this is just to give you some insights of the idea.

from sklearn import datasets
from sklearn.ensemble import GradientBoostingClassifier, \
RandomForestClassifier, AdaBoostClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier


def get_param(model_name, params):
    """
    Not the most sufficient way.
    I recommend to have params and models
    in OrderedDict() instead.
    """
    for k, v in params.items():
        mn = str(model_name).upper().split('_')
        for k_ in str(k).upper().split('_'):
            if k_ in mn:
                return v


def models_gridSearchCV(models, params, scorer, X, y):
    all_results = dict.fromkeys(models.keys(), [])
    best_model = {'model_name': None,
                  'best_estimator': None,
                  'best_params': None,
                  'best_score': -9999999}
    for model_name, model in models.items():
        print("Processing {} ...".format(model_name))
        # or use OrderedDict() and zip(models, params) above
        # so there will be no need to check
        param = get_param(model_name, params)
        if param is None:
            continue
        clf = GridSearchCV(model, param, scoring=scorer)
        clf.fit(X, y)
        all_results[model_name] = clf.cv_results_
        if clf.best_score_ > best_model.get('best_score'):
            best_model['model_name'] = model_name
            best_model['best_estimator'] = clf.best_estimator_
            best_model['best_params'] = clf.best_params_
            best_model['best_score'] = clf.best_score_

    return best_model, all_results


### TEST ###
iris = datasets.load_iris()
X, y = iris.data, iris.target

# OrderedDict() is recommended here
# to maintain order between models and params 
models = {'model_gbm': GradientBoostingClassifier(),
          'model_rf': RandomForestClassifier(),
          'model_dt': DecisionTreeClassifier(),
          'model_svm': SVC(),
          'model_ada': AdaBoostClassifier()}
params_gbm = {'learning_rate': [0.1, 0.2], 'n_estimators': [50, 100]}
params_rf = {'n_estimators': [50, 100]}
params_dt = {'splitter': ['best', 'random'], 'max_depth': [1, 5]}
params_svm = {'C': [1, 2, 5], 'kernel': ['rbf', 'linear']}
params_ada = {'n_estimators': [10, 100], 'learning_rate': [0.5, 1]}

# OrderedDict() is recommended here
# to maintain order between models and params 
params = {'params_gbm': params_gbm,
          'params_rf': params_rf,
          'params_dt': params_dt,
          'params_svm': params_svm,
          'params_ada': params_ada}

best_model, all_results = models_gridSearchCV(models, params, 'accuracy', X, y)
print(best_model)
# print(all_results)

Result

Processing model_gbm ...
Processing model_rf ...
Processing model_dt ...
Processing model_svm ...
Processing model_ada ...
{'model_name': 'model_svm', 'best_estimator': SVC(C=5), 
 'best_params': {'C': 5, 'kernel': 'rbf'}, 'best_score': 0.9866666666666667}
Yahya
  • 13,349
  • 6
  • 30
  • 42