5

I have a .npy file of which I know basically everything (size, number of elements, type of elements, etc.) and I'd like to have a way to retrieve specific values without loading the array. The goal is to use the less amount of memory possible.

I'm looking for something like

def extract('test.npy',i,j):
    return "test.npy[i,j]"

I kinda know how to do it with a text file (see recent questions) but doing this with a npy array would allow me to do more than line extraction.

Also if you know any way to do this with a scipy sparse matrix that would be really great.

Thank you.

AdrienNK
  • 850
  • 8
  • 19
  • 2
    See [What is the way data is stored in *.npy](http://stackoverflow.com/q/4090080/222914) – Janne Karila Mar 10 '14 at 12:55
  • In short, the way the data is stored in the `.npy` format would make this difficult. I would recommend using HDF5 instead, which allows you to read or modify any arbitrary array or slice of an array - take a look at [h5py](http://www.h5py.org/) or [PyTables](http://www.pytables.org). – ali_m Mar 10 '14 at 13:39
  • 5
    @ali_m - That's not true at all. `.npy` files are designed to be memmapped. HDF5 is certainly more efficient for this, as it chunks the file (less change of long seeks on disk), but support for memmapping `.npy` files is built-in to numpy. Just use `np.load(filename, mmap_mode='r')`. In general, though, I certainly agree that HDF5 is a better choice in the long run. – Joe Kington Mar 10 '14 at 13:45
  • @JoeKington I stand corrected! – ali_m Mar 10 '14 at 14:01

1 Answers1

8

Just use data = np.load(filename, mmap_mode='r') (or one of the other modes, if you need to change specific elements, as well).

This will return a memory-mapped array. The contents of the array won't be loaded into memory and will be on disk, but you can access individual items by indexing the array as you normally would. (Be aware that accessing some slices will take much longer than accessing other slices depending on the shape and order of your array.)

HDF is a more efficient format for this, but the .npy format is designed to allow for memmapped arrays.

Joe Kington
  • 275,208
  • 71
  • 604
  • 463