0

I am trying to save variables into file to be recalled. This is often done with pickle so that's what I started with. I took a small sample of my data to see if it would save. this sample amounted to 20 mb. my total data was around 205 time larger(4001 mb). This Sample saved but when I tried to save the full data I ran into OSError: [Errno 22] Invalid argument

After further exploration, I found out that pickle has a bug that does not allow you to produce a file larger 4gb. This is a little less than the amount of storage my data takes up.

Stated here: Any idea with "OSError: [Errno 22] Invalid argument" in pickle.dump?

here it is stated that its a problem on OS X which I am usinhttps://bugs.python.org/issue24658

I found this bit of code but could not understand

import pickle
 
class MacOSFile(object):
 
    def __init__(self, f):
        self.f = f
 
    def __getattr__(self, item):
        return getattr(self.f, item)
 
    def read(self, n):
        # print("reading total_bytes=%s" % n, flush=True)
        if n >= (1 << 31):
            buffer = bytearray(n)
            idx = 0
            while idx < n:
                batch_size = min(n - idx, 1 << 31 - 1)
                # print("reading bytes [%s,%s)..." % (idx, idx + batch_size), end="", flush=True)
                buffer[idx:idx + batch_size] = self.f.read(batch_size)
                # print("done.", flush=True)
                idx += batch_size
            return buffer
        return self.f.read(n)
 
    def write(self, buffer):
        n = len(buffer)
        print("writing total_bytes=%s..." % n, flush=True)
        idx = 0
        while idx < n:
            batch_size = min(n - idx, 1 << 31 - 1)
            print("writing bytes [%s, %s)... " % (idx, idx + batch_size), end="", flush=True)
            self.f.write(buffer[idx:idx + batch_size])
            print("done.", flush=True)
            idx += batch_size

def pickle_dump(obj, file_path):
    with open(file_path, "wb") as f:
        return pickle.dump(obj, MacOSFile(f), protocol=pickle.HIGHEST_PROTOCOL)
 
 
def pickle_load(file_path):
    with open(file_path, "rb") as f:

        return pickle.load(MacOSFile(f))

In my code

with open("file.pickle", "wb") as f: 
    pickle.dump((boards, value), f)

I used a simple dump

I was wondering if someone would be able to explain what the code provided above does and how it works If it does? source(https://www.programmersought.com/article/3832726678/)

a simple way to recreate this would be create a massive list and save it.

pygo
  • 1
  • 5
  • The claim is that the underlying `write` on OSX fails when writes are over `INT_MAX` in [`limits.h`](https://opensource.apple.com/source/xnu/xnu-124.8/EXTERNAL_HEADERS/machine/limits.h.auto.html). This code writes in smaller chunks to avoid that error. It seems to be buggy because it doesn't check the actual write / read counts against requested. But fix those bugs and it may work. – tdelaney Oct 13 '20 at 19:33
  • @tdelaney thank you so much it works great. – pygo Oct 13 '20 at 19:38

1 Answers1

0

this code takes a minute but will save the data when ever you want to read or write you need to use the provided read and write functions:

import pickle
 
class MacOSFile(object):
 
    def __init__(self, f):
        self.f = f
 
    def __getattr__(self, item):
        return getattr(self.f, item)
 
    def read(self, n):
        # print("reading total_bytes=%s" % n, flush=True)
        if n >= (1 << 31):
            buffer = bytearray(n)
            idx = 0
            while idx < n:
                batch_size = min(n - idx, 1 << 31 - 1)
                # print("reading bytes [%s,%s)..." % (idx, idx + batch_size), end="", flush=True)
                buffer[idx:idx + batch_size] = self.f.read(batch_size)
                # print("done.", flush=True)
                idx += batch_size
            return buffer
        return self.f.read(n)
 
    def write(self, buffer):
        n = len(buffer)
        print("writing total_bytes=%s..." % n, flush=True)
        idx = 0
        while idx < n:
            batch_size = min(n - idx, 1 << 31 - 1)
            print("writing bytes [%s, %s)... " % (idx, idx + batch_size), end="", flush=True)
            self.f.write(buffer[idx:idx + batch_size])
            print("done.", flush=True)
            idx += batch_size

def pickle_dump(obj, file_path):
    with open(file_path, "wb") as f:
        return pickle.dump(obj, MacOSFile(f), protocol=pickle.HIGHEST_PROTOCOL)
 
 
def pickle_load(file_path):
    with open(file_path, "rb") as f:

        return pickle.load(MacOSFile(f))
pygo
  • 1
  • 5