how to save a scikit-learn pipline with keras regressor inside to disk?

Question

I have a scikit-learn pipline with kerasRegressor in it:

estimators = [
    ('standardize', StandardScaler()),
    ('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=5, batch_size=1000, verbose=1))
    ]
pipeline = Pipeline(estimators)

After, training the pipline, I am trying to save to disk using joblib...

joblib.dump(pipeline, filename , compress=9)

But I am getting an error:

RuntimeError: maximum recursion depth exceeded

How would you save the pipeline to disk?

You could look at dill. Maybe it works https://pypi.python.org/pypi/dill — Moritz, Jun 23 '16 at 07:05
You should simply increase the value of maximum recursion depth: http://stackoverflow.com/questions/3323001/maximum-recursion-depth — user1808924, Jun 23 '16 at 07:48

score 33 · Accepted Answer · answered Apr 14 '17 at 16:25

33

I struggled with the same problem as there are no direct ways to do this. Here is a hack which worked for me. I saved my pipeline into two files. The first file stored a pickled object of the sklearn pipeline and the second one was used to store the Keras model:

...
from keras.models import load_model
from sklearn.externals import joblib

...

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('estimator', KerasRegressor(build_model))
])

pipeline.fit(X_train, y_train)

# Save the Keras model first:
pipeline.named_steps['estimator'].model.save('keras_model.h5')

# This hack allows us to save the sklearn pipeline:
pipeline.named_steps['estimator'].model = None

# Finally, save the pipeline:
joblib.dump(pipeline, 'sklearn_pipeline.pkl')

del pipeline

And here is how the model could be loaded back:

# Load the pipeline first:
pipeline = joblib.load('sklearn_pipeline.pkl')

# Then, load the Keras model:
pipeline.named_steps['estimator'].model = load_model('keras_model.h5')

y_pred = pipeline.predict(X_test)

answered Apr 14 '17 at 16:25

constt

2,250
1
17
18

I tried this approach with KerasClassifier and I got error: 'KerasClassifier' object has no attribute 'save'. Are you sure you were not actually doing pipeline.named_steps['estimator'].model.model.save('keras_model.h5') ? In this case, however, it seems that one would have to wrap the KerasClassifier object around the loaded model again. – JohnnyQ Nov 06 '17 at 09:02
1

Yes, I'm absolutely sure. Just checked once again, it works like a charm :) (python 3.5.2, keras 2.0.8, sklearn 0.19.1) – constt Nov 07 '17 at 12:15
Thanks a lot. Worked like a charm! It's so simple and there but nobody figured it out. Just saving the steps of pipeline (except Keras ones) as pickle/joblib and saving keras as model.save. Great answer. – Pranzell Dec 17 '18 at 15:10
This has been a godsend. Thank you a ton!! – Asher11 Jan 14 '21 at 22:09
DO NOT define your `build_model` function as a local/nested function, otherwise you would get `PicklingError: Can't pickle .build_model at 0xdeadbeef>: it's not found as module_name.outer_func_name..build_model` – EasonL Aug 20 '21 at 00:02

score 1 · Answer 2 · answered Sep 02 '20 at 22:05

Keras is not compatible with pickle out of the box. You can fix it if you are willing to monkey patch: https://github.com/tensorflow/tensorflow/pull/39609#issuecomment-683370566.

You can also use the SciKeras library which does this for you and is a drop in replacement for KerasClassifier: https://github.com/adriangb/scikeras

Disclosure: I am the author of SciKeras as well as that PR.

how to save a scikit-learn pipline with keras regressor inside to disk?

2 Answers2

Linked