I have a huge compressed numpy array saved to disk (~20gb in memory, much less when compressed). I need to know the shape of this array, but I do not have the available memory to load it. How can I find the shape of the numpy array without loading it into memory?
Asked
Active
Viewed 4,928 times
2 Answers
17
This does it:
import numpy as np
import zipfile
def npz_headers(npz):
"""Takes a path to an .npz file, which is a Zip archive of .npy files.
Generates a sequence of (name, shape, np.dtype).
"""
with zipfile.ZipFile(npz) as archive:
for name in archive.namelist():
if not name.endswith('.npy'):
continue
npy = archive.open(name)
version = np.lib.format.read_magic(npy)
shape, fortran, dtype = np.lib.format._read_array_header(npy, version)
yield name[:-4], shape, dtype

John Zwinck
- 239,568
- 38
- 324
- 436
-
1This answer is perfect and should really be the accepted one... – Aristides Mar 17 '21 at 01:27
8
Opening the file in mmap_mode
might do the trick.
If not None, then memory-map the file, using the given mode (see `numpy.memmap` for a detailed description of the modes). A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.
It is also possible to read the header block without reading the data buffer, but that requires digging further into the underlying lib/npyio/format
code. I explored that in a recent SO question about storing multiple arrays in a single file (and reading them).
-
This works for .npy but not .npz. I don't think mmap is at all useful with .npz--certainly not if the data are compressed aka `np.savez_compressed()`. – John Zwinck Apr 05 '17 at 05:30
-
Doing any of this with the `npz` archive will require digging into that branch of the loader, `np.lib.npyio.NpzFile`. Key file format information is in `np.lib.npyio.format` – hpaulj Apr 05 '17 at 06:22
-
1