0

I have a 4D array where every value at axis=3 is either a 1 or a 0. I've tried saving this as an array in a .npy file. But for a (252,512,512,6) array, this already gave 3GB of data. I am wondering if it is possible to store these kind of data in a much more efficient way. Thus drastically lowering the filesize.

I've already tried using "False" and "True", and i got it down to about 400MB, but I am still looking of it is possible to further reduce said number. Either via the datatype of the way I am saving it.

drjeffrey
  • 9
  • 2
  • *"... where every value at axis=3 is either a 1 or a 0."* Does that mean *all* the values in the array are either 0 or 1? – Warren Weckesser Feb 20 '20 at 16:05
  • Does this answer your question? [Compress numpy arrays efficiently](https://stackoverflow.com/questions/22400652/compress-numpy-arrays-efficiently) – AMC Feb 20 '20 at 16:41

1 Answers1

1

You can use np.savez_compressed, which will significantly compress the array and reduce the filesize:

# create sample array:
>>> x = np.random.randint(1, 30, size=(252, 512, 512, 6))

>>> np.savez("test.npz", x)
# test.npz is 2.95GB

>>> np.savez_compressed("test2.npz", arr = x)
# test2.npz is 369MB

To re-load your array, use

>>> loaded = np.load("test2.npz")
>>> x2 = loaded["arr"]

And you can test that x2 (your re-loaded array), is equal to x (your original array)

>>> np.array_equal(x, x2)
True
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 1
    Thank you! It's actually a really logical solution. This together with using True and False got it down to .5 MB – drjeffrey Feb 20 '20 at 16:39