I have a gridsearchCV object I created with
grid_search = GridSearchCV(pred_home_pipeline, param_grid)
I would like to save the entire grid-search object so I can explore the model-tuning results later. I do not want to just save the best_estimator_
. But after dumping and reloading, the reloaded and original grid_search objects are different in some way which I cannot track down.
# save to disk
with open(filepath, 'wb') as handle:
pickle.dump(grid_search, handle, protocol=pickle.HIGHEST_PROTOCOL)
# reload
with open(filepath, 'rb') as handle:
grid_reloaded = pickle.load(handle)
# test object is unchanged after dump/reload
print(grid_search == grid_reloaded)
False
Weird. Looking at the outputs of print(grid_search)
and print(grid_reloaded)
they certainly look the same.
And they create the exact same set of 525 predicted values for data I held out entirely from the grid-search process:
grid_search_preds = grid_search.predict(X_test)
grid_reloaded_preds= grid_reloaded.predict(X_test)
(grid_search_preds == grid_reloaded_preds).all()
True
...Even though the best_estimator_
attributes are not technically the same:
grid_search.best_estimator_ == grid_reloaded.best_estimator_
False
...although the best_estimate_ attributes also certainly look the same comparing print(grid_search.best_estimatmator_)
and print(grid_reloaded.best_estimator_)
What's going on here? Is it safe to save the gridsearchcv object for inspection later?