3

I am trying different ML models, all using a pipeline which includes a transformer and an algorithm, 'nested' in a GridSearchCV to find the best hyperparameters.

When running Ridge, Lasso and ElasticNet regressions, I would like to store all the computed coefficients, not only the best_estimator_ coefficients, in order to plot them according to the alpha's path. In other words, when the GridSearchCV changes the alpha parameter and fit a new model, I would like to store the resulting coefficients, to plot them against the alpha values.

You can take a look at this official scikit post for a beautiful example.

This is my code:

from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error
import time
start = time.time()

# Cross-validated - Ridge Regression
model_ridge = make_pipeline(transformer, Ridge()) # my transformer is already defined 

alphas = np.logspace(-5, 5, num = 50)
params = {'ridge__alpha' : alphas}
    
grid = GridSearchCV(model_ridge, param_grid = params, cv=10)
grid.fit(X_train, y_train)
regressor = grid.estimator.named_steps['ridge'].coef_ # when I add this line, it returns an error
    
stop = time.time()
training_time = stop-start

y_pred = grid.predict(X_test)

Ridge_Regression_results = {'Algorithm' : 'Ridge Regression', 
                             'R²' : grid.score(X_train, y_train), 
                             'MAE' : mean_absolute_error(y_test, y_pred), 
                             'RMSE' : np.sqrt(mean_squared_error(y_test, y_pred)),
                             'Training time (sec)' : training_time}

In this topic: return coefficients from Pipeline object in sklearn, the author was adviced to use the named_steps attribute of the pipeline. But in my case, when I try to use it, it returns the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18260/3310195105.py in <module>
     13 
     14 grid.fit(X_train, y_train)
---> 15 regressor = grid.estimator.named_steps['ridge'].coef_
     16 
     17 

AttributeError: 'Ridge' object has no attribute 'coef_'

I don't understand why this is happening.

For this to work, my guess is that this storing should happen during the GridSearchCV loop, but I can't figure out how to do this.

blackraven
  • 5,284
  • 7
  • 19
  • 45
Greg Mansio
  • 109
  • 5
  • 1
    https://stackoverflow.com/questions/65359261/can-you-get-all-estimators-from-an-sklearn-grid-search-gridsearchcv might help. Long story short, there's no way you can get them with hyperparameter tuners like `GridSearchCV` as they do not store all the fitted estimators. – amiola Aug 22 '22 at 13:28
  • 1
    Alrigth then, thank's for the very efficient still sad answer ahah. – Greg Mansio Aug 22 '22 at 13:31
  • 1
    The error you're seeing is because `grid.estimator` is the unfitted input pipeline. You can use `grid.best_estimator_.named_steps` to get the final refitted best-hyperparameter object, but that doesn't answer your main question. – Ben Reiniger Aug 22 '22 at 13:50

1 Answers1

2

You can get the coefficients by making them "scores," although it isn't very semantically correct.

import pandas as pd

def myscores(estimator, X, y):
    r2 = estimator.score(X, y)
    coefs = estimator.named_steps["ridge"].coef_ 
    ret_dict = {
        f'a_{i}': coef for i, coef in enumerate(coefs)
    }
    ret_dict['r2'] = r2
    return ret_dict

grid = GridSearchCV(
    model_ridge,
    param_grid=params,
    scoring=myscores,
    refit='r2'
)

print(pd.DataFrame(grid.cv_results_)
Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29