Export sklearn classifier to reference it in other scripts

Question

I have a random forest classifier stored in the object clf. In really simplified terms, I did the following:

# Import libraries
import pandas as pd
from import sklearn.ensemble import RandomForestClassifier as rfc

# Import data
exog = pd.read_csv('train.csv')
trgt = pd.read_csv('target.csv')

# Declare classifier
clf = rfc(n_estimators=51, bootstrap=True, max_features=3)

# Fit classifier to data
clf.fit(exog, trgt)

I would like to export clf so I can reference it in another script. My goal is to import clf into a Python script that will be running on a remote server. I want to input out-of-sample data into it and have it return their respective scores using clf.predict_proba(new_data).

My top priority is to avoid training the classifier every time I predict the probabilities for new datasets. Is there a way to export the tuned clf object?

This thread pointed me in the right direction, but the solution is using cPickle and it's throwing the following error:

TypeError: write() argument must be str, not bytes

Does this answer your question? [Save classifier to disk in scikit-learn](https://stackoverflow.com/questions/10592605/save-classifier-to-disk-in-scikit-learn) — Arco Bast, Mar 17 '20 at 21:36
It does. I was looking for a thread like this on SO but couldn't find it. I may edit the thread to add more tags to it or change the text to make it match with more searches. Additionally, I am confused as to how to save the file to disk. Is there a specific extension that has to be used? Finally, the answer on that thread uses `cPickle`. Is there a difference between the two? — Arturo Sbr, Mar 17 '20 at 21:54
You do not have to use a specific extension, but you could if you wanted. Some people use '.pickle'. While pickle is pure python, cPickle is a C-extension, which supposedly makes it faster. Actually, I'd try cPickle first. I also do not think that you have to edit your question. — Arco Bast, Mar 17 '20 at 22:06

Adam · Answer 1 · 2020-03-17T21:33:45.277

There is plenty about model persistence on the sklearn documentation, but it advises you to either use pickle or joblib.

e.g. joblib

>>> from joblib import dump, load
>>> dump(clf, 'filename.joblib')
>>> clf = load('filename.joblib')

or pickle

>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)

From the docs:

In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:

score 1 · Accepted Answer · answered Mar 17 '20 at 21:32

1

This is the code snipet will work for you:

import pickle
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(clf, open(filename, 'wb'))

# some time later...

# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, Y_test)
print(result)

from this source.

your question has duplicate.

answered Mar 17 '20 at 21:32

Mohammad Reza

66
5

Does the extension have to be `.sav`? – Arturo Sbr Mar 17 '20 at 21:53
No it doesn't matter. But it is conventional to save pickle object with .pickle extension. – Mohammad Reza Mar 17 '20 at 21:56

Arco Bast · Answer 3 · 2020-03-17T21:41:57.790

0

You can serialize the object with pickle or cloudpickle.This would work, as long as you make sure, that versions of the packages are the same in your remote and local environment.

For saving:

import pickle

with open('/path/to/file', 'w') as f:
    pickle.dump(clf, f)

For loading:

import pickle
with open('/path/to/file') as f:
    clf = pickle.load(f)

edited Mar 17 '20 at 21:41

answered Mar 17 '20 at 21:28

Arco Bast

3,595
2
26
53

Thanks. What does serializing mean? – Arturo Sbr Mar 17 '20 at 21:29
It means to translate an arbitrary object to a format that can be stored. Pickle is the standard python module to handle this task. – Arco Bast Mar 17 '20 at 21:32

Export sklearn classifier to reference it in other scripts

3 Answers3