I just came across this example on Model Grid Selection here:
https://chrisalbon.com/machine_learning/model_selection/model_selection_using_grid_search/
Question:
The example reads
# Create a pipeline
pipe = Pipeline([('classifier', RandomForestClassifier())])
# Create space of candidate learning algorithms and their hyperparameters
search_space = [{'classifier': [LogisticRegression()],
'classifier__penalty': ['l1', 'l2'],
'classifier__C': np.logspace(0, 4, 10)},
{'classifier': [RandomForestClassifier()],
'classifier__n_estimators': [10, 100, 1000],
'classifier__max_features': [1, 2, 3]}]lassifier', RandomForestClassifier())])
As I understand the code, search_space
contains the used classifiers and their parameters. However, I don't get what the purpose of Pipeline
and why it contains RandomForestClassifier()
?
Background: In my desired workflow, I need to train a doc2vec model (gensim), based on 3 different classifiers. Both the model and the classifiers should apply GridSearch to parameters. I like to store the results in a table and save the best model, that is the one with the highest accuracy.