0

I have trained a prediction model using scikit-learn, and used pickle to save it to hard drive. The pickle file is 58M, which is quite sizable.

To use the model, I wrote something like this:

def loadModel(pkl_fn):
    with open(pkl_fn, 'r') as f:
         return pickle.load(f)


if __name__ == "__main__":
    import sys
    feature_vals = read_features(sys.argv[1])
    model = loadModel("./model.pkl")
    # predict 
    # model.predict(feature_vals)

I am wondering about the efficiency when running the program many times in command line.

Pickle files are supposed to be fast to load, but is there any way to even speed up? Can I compile the whole thing into a binary executable?

serv-inc
  • 35,772
  • 9
  • 166
  • 188
GeauxEric
  • 2,814
  • 6
  • 26
  • 33
  • Can you give use more details about your use case? From what I understand you are running this program every time you want to make a prediction, how often does this happen? – ldirer Jul 23 '15 at 19:50
  • Is there a reason why you can't run the loading code once and then use it for all of your predictions? Why must you also run the loading code for each prediction if it loads the same thing? Even if you speed up the loading, this usage method will still cause some slowdowns, so I would look into avoiding the multiple loadings. – IVlad Jul 23 '15 at 20:47
  • @IVlad , I am writing this small tool for some bio-physics people, "load one, predict one" is what they asked for. – GeauxEric Jul 23 '15 at 21:14

1 Answers1

5

If you are worried about loading time, you can use joblib.dump and joblib.load, they are more efficient than pickle in the case of scikit-learn.

For a full (pretty straightforward) example see the docs or this related answer from ogrisel: Save classifier to disk in scikit-learn

Community
  • 1
  • 1
ldirer
  • 6,606
  • 3
  • 24
  • 30