23

I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here.

sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload it? My training data is huge and it takes around half a minute to learn the model. So I was wondering if any save/load capability exists in OLS model.

I tried the repr() method on the model object but it does not return any useful information.

Josef
  • 21,998
  • 3
  • 54
  • 67
Nik
  • 5,515
  • 14
  • 49
  • 75

2 Answers2

47

The models and results instances all have a save and load method, so you don't need to use the pickle module directly.

Edit to add an example:

import statsmodels.api as sm

data = sm.datasets.longley.load_pandas()

data.exog['constant'] = 1

results = sm.OLS(data.endog, data.exog).fit()
results.save("longley_results.pickle")

# we should probably add a generic load to the main namespace
from statsmodels.regression.linear_model import OLSResults
new_results = OLSResults.load("longley_results.pickle")

# or more generally
from statsmodels.iolib.smpickle import load_pickle
new_results = load_pickle("longley_results.pickle")

Edit 2 We've now added a load method to main statsmodels API in master, so you can just do

new_results = sm.load('longley_results.pickle')
jseabold
  • 7,903
  • 2
  • 39
  • 53
  • 3
    Additionally, if you use the pickled results and model only for prediction, then it is possible to strip the training data (but many methods won't work anymore) statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.save.html – Josef May 16 '13 at 11:08
  • @jseabold could you give an example? – Nik May 17 '13 at 21:15
  • Sure. Edited to add an example. – jseabold May 18 '13 at 22:08
  • jseabold: I tried the `sm.load` method but the interpreter complains that the module does not have 'load' attribute. Is there a new version of statsmodels that I should be using? – Nik Jun 04 '13 at 06:40
  • It is in master on github and will be in the next release. You need to install from source if you want to use it now. – jseabold Jun 06 '13 at 14:06
  • any alternative strategy to save in a json file for example? – Denis C Aug 29 '13 at 11:52
  • You can use the json module (or pandasjson) just as you would the pickle module to dump results objects to json. We have plans to make something built-in for the next release. – jseabold Aug 30 '13 at 14:54
7

I've installed the statsmodels library and found that you can save the values using the pickle module in python.

Models and results are pickleable via save/load, optionally saving the model data. [source]

As an example:

Given that you have the results saved in the variable results:

To save the file:

import pickle    
with open('learned_model.pkl','w') as f:
  pickle.dump(results,f)

To read the file:

import pickle
with open('learned_model.pkl','r') as f:
  model_results = pickle.load(f)
jamylak
  • 128,818
  • 30
  • 231
  • 230
RMcG
  • 1,045
  • 7
  • 14