0

I've run a regression 5,693 times and want to save the output, since it took several hours to run. I've captured it in a list called res, and the object (if it matters) is a MarkovRegressionResultsWrapper object from package statsmodels.

I thought the way to go was pickle. I'm saving to a private directory for my own use, so security isn't an issue, and JSON doesn't seem to work for objects (I'm new, so perhaps this is wrong?).

Here is an example I found that works fine:

import pickle
a = ['test value','test value 2','test value 3']

file_Name = "testfile"
# open the file for writing
fileObject = open(file_Name,'wb') 

# this writes the object a to the
# file named 'testfile'
pickle.dump(a,fileObject)   

# here we close the fileObject
fileObject.close()

However, when I use the exact same code, but save my list res, it gives an error:

file_Name = "testfile"
# open the file for writing
fileObject = open(file_Name,'wb') 

# this writes the object a to the
# file named 'testfile'
pickle.dump(res,fileObject)   

# here we close the fileObject
fileObject.close()

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-43-ab4800ac1a51> in <module>()
      7 # this writes the object a to the
      8 # file named 'testfile'
----> 9 pickle.dump(res,fileObject)
     10 
     11 # here we close the fileObject

OSError: [Errno 22] Invalid argument

I'm using Python 3.6 with Jupyter Notebook on a Macbook Pro. Both a and res are of type list, so the only thing that is different is what the list contains. Why am I getting this error? Is this the best way to save this list of objects or should I be doing something different?

Jesse Blocher
  • 523
  • 1
  • 4
  • 16
  • what is `res` in your `pickle.dump`? – PYA Jul 05 '17 at 15:22
  • `res` is a list that contains 5,693 `MarkovRegressionResultsWrapper` objects – Jesse Blocher Jul 05 '17 at 15:22
  • `OSError` suggests that it is not necessarily your Python code that is the issue here, but this is a system error message ([doc](https://docs.python.org/2/library/exceptions.html#exceptions.OSError)). How large is that file you are writing? – patrick Jul 05 '17 at 15:24
  • 4
    I belive the error you are getting is a [known bug in `pickle`](https://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb) regarding the size of the object passed in. – Christian Dean Jul 05 '17 at 15:24
  • @ChristianDean Dang, that's good to know. But if I read the post correctly, this is a Mac-only issue, no? – patrick Jul 05 '17 at 15:30
  • `sys.getsizeof(res)` tells me 48464, which I presume is in bytes, so not very big. – Jesse Blocher Jul 05 '17 at 15:30
  • @JesseBlocher You're right, it's not that big. But pickle still seems to be choking on it due to the bug. – Christian Dean Jul 05 '17 at 15:33
  • @ChristianDean What other options do I have? I have access to a Unix cluster to run my code, but I'd prefer not to do that since it took a few hours to run. I have the object in memory on my Mac, is there any other way to save it? – Jesse Blocher Jul 05 '17 at 15:33
  • @JesseBlocher The link I posted in my earlier comment provided several solutions. One solution I can think of off the top of my head is to simply read the file in chunks. I believe one of the answers demonstrates this method. – Christian Dean Jul 05 '17 at 15:37
  • 1
    @patrick Yup, it's MAC only. Windows and Linux users should be unaffected. – Christian Dean Jul 05 '17 at 15:40

1 Answers1

2

@ChristianDean provided the answer in the comments. This is related to a known bug in pickle in Python 3.6 on Mac OSX only. Python 3 - Can pickle handle byte objects larger than 4GB?

Jesse Blocher
  • 523
  • 1
  • 4
  • 16