I've been using the pickle library to read and write numpy arrays but they tend to be very large. In my quest for finding out if there was a better way, I found Mark's answer on this page (the one with the chart). Basically, storing it as a binary file appears to not only be the fastest and reading and writing, but also takes among the smallest amount of memory. So I clicked on his github link and found on line 96 the code I believe he uses to save the ndarrays. His code is:
class Binary(TimeArrStorage):
def save(self, arr, pth):
with open(pth, 'wb+') as fh:
fh.write(b'{0:s} {1:d} {2:d}\n'.format(arr.dtype, *arr.shape))
fh.write(arr.data)
sync(fh)
def load(self, pth):
with open(pth, 'rb') as fh:
dtype, w, h = str(fh.readline()).split()
return frombuffer(fh.read(), dtype=dtype).reshape((int(w), int(h)))
My specific questions are, what is the meaning of the string passed to the first call to fh.write? I assume the preceding "b" means binary, but what about the {0:s} {1:d} {2:d}, especially since there are only two parameters inside the parenthesis after format. Second question is can this method be used for ndarrays of any data type? Third question is, do I need to call the sync method (method is defined at the top of the github page)? And last question is, I looked up what arr.data returns if arr is an ndarray and it's basically a memory location to where the data begins, so how does this code know it's reached the end of the object it's trying to write?