Problem Statement
I'm using python3 and trying to pickle a dictionary of IntervalTrees which weighs something like 2 to 3 GB. This is my console output:
10:39:25 - project: INFO - Checking if motifs file was generated by pickle...
10:39:25 - project: INFO - - Motifs file does not seem to have been generated by pickle, proceeding to parse...
10:39:38 - project: INFO - - Parse complete, constructing IntervalTrees...
11:04:05 - project: INFO - - IntervalTree construction complete, saving pickle file for next time.
Traceback (most recent call last):
File "/Users/alex/Documents/project/src/project.py", line 522, in dict_of_IntervalTree_from_motifs_file
save_as_pickled_object(motifs, output_dir + 'motifs_IntervalTree_dictionary.pickle')
File "/Users/alex/Documents/project/src/project.py", line 269, in save_as_pickled_object
def save_as_pickled_object(object, filepath): return pickle.dump(object, open(filepath, "wb"))
OSError: [Errno 22] Invalid argument
The line in which I attempt the save is
def save_as_pickled_object(object, filepath): return pickle.dump(object, open(filepath, "wb"))
The error comes maybe 15 minutes after save_as_pickled_object
is invoked (at 11:20).
I tried this with a much smaller subsection of the motifs file and it worked fine, with all of the exact same code, so it must be an issue of scale. Are there any known bugs with pickle in python 3.6 relating to the scale of what you try to pickle? Are there known bugs with pickling large files in general? Are there any known ways around this?
Thanks!
Update: This question might be a duplicate of Python 3 - Can pickle handle byte objects larger than 4GB?
Solution
This is the code I used instead.
def save_as_pickled_object(obj, filepath):
"""
This is a defensive way to write pickle.write, allowing for very large files on all platforms
"""
max_bytes = 2**31 - 1
bytes_out = pickle.dumps(obj)
n_bytes = sys.getsizeof(bytes_out)
with open(filepath, 'wb') as f_out:
for idx in range(0, n_bytes, max_bytes):
f_out.write(bytes_out[idx:idx+max_bytes])
def try_to_load_as_pickled_object_or_None(filepath):
"""
This is a defensive way to write pickle.load, allowing for very large files on all platforms
"""
max_bytes = 2**31 - 1
try:
input_size = os.path.getsize(filepath)
bytes_in = bytearray(0)
with open(filepath, 'rb') as f_in:
for _ in range(0, input_size, max_bytes):
bytes_in += f_in.read(max_bytes)
obj = pickle.loads(bytes_in)
except:
return None
return obj