What is the recommended way to persist (pickle) custom sklearn pipelines?

Question

I have built an sklearn pipeline that combines a standard support vector regression component with some custom transformers that create features. This pipeline is then put into an object which is trained and then pickled (this seems to be the recommended way). The unpickled object is used to make predictions.

For distribution, this is turned into an executable file with pyinstaller.

When I call the unpickled regression object from a unit test, it works fine.

However, when I attempt to use the PyInstaller binary to make predictions, I get a long stack trace that ends with:

module = loader.load_module(fullname)   File "messagestream.pxd", line 5, in init scipy.optimize._trlib._trlib ImportError: No module named 'scipy._lib.messagestream'

This feels like some kind of pickling error, probably due to the interaction of pickling with pyinstaller. How can I refactor my code so that my custom pipeline runs just as easily and robustly as a standard sklearn regressor after unpickling?

No, I am pickling with pickle, though I have also tried joblib and dill to no avail. — Roko Mijic, Nov 01 '17 at 19:14
Someone else has the same problem: https://stackoverflow.com/questions/47055712/error-when-executing-compiled-file-no-module-named-scipy-lib-messagestream — Roko Mijic, Nov 02 '17 at 08:29

Roko Mijic · Accepted Answer · 2017-11-02T09:12:12.477

9

OK, after some googling around it seems to be the case that the root cause is not pickling, it is simply a pyinstaller "hidden imports" issue, but for some reason it only shows up when pickling (don't ask me why).

The following solved the immediate issue for me: edit the .spec file to add the following hidden import with Scipy:

 hiddenimports=['scipy._lib.messagestream']

I also needed some other hidden imports related to other libraries

 hiddenimports=['sklearn.neighbors.typedefs',
                'scipy._lib.messagestream',
                'pandas._libs.tslibs.timedeltas'   ]

edited Nov 02 '17 at 09:12

answered Nov 02 '17 at 09:04

Roko Mijic

6,655
4
29
36

thanks for the trick. I didn’t have to deal with that in may 2017. Is this a bug in the new version ? – Stéphane Jan 09 '18 at 14:38
Works with py2exe too. – munieq11 Feb 21 '18 at 22:23
Where can i find the .spec file? I've been looking in the .whl and the evironment I can find it in neither of those. – Daan Luttik Apr 30 '18 at 10:22

score 3 · Answer 2 · answered Sep 11 '18 at 17:23

3

If anyone just wants to do this via CLI argument instead of through the .spec file as presented in Roko's answer, this is the syntax:

pyinstaller --hidden-import scipy._lib.messagestream --onefile your_python_file_here.py

answered Sep 11 '18 at 17:23

Braxvan

67
14

What is the recommended way to persist (pickle) custom sklearn pipelines?

2 Answers2

Linked