9

I have an multi dimentional array with numpy save and want only to partial load some dimension because array is very big.

How can I do it in simple way ?


Edit: Context is is simple and basic:

You have 5 Gb array saved with numpy.save. But, you only need to have access some parts of the array A[:,:] without loading 5gb in Memory.


ANSWER is: Using h5py to save/load partially the data: here code sample:

import sys
import h5py

  def main():
data = read()

if sys.argv[1] == 'x':
    x_slice(data)
elif sys.argv[1] == 'z':
    z_slice(data)

def read():
f = h5py.File('/tmp/test.hdf5', 'r')
return f['seismic_volume']

 def z_slice(data):
return data[:,:,0]

  def x_slice(data):
return data[0,:,:]
  • 3
    Editing your question and adding a code sample of what you have already tried will help you in getting answers. I suggest you do it. – Dimitris Fasarakis Hilliard Dec 31 '15 at 03:31
  • @Jim, that is not simple enough to be pythonic. – Mad Physicist Dec 31 '15 at 03:41
  • Is this question unclear because it lacks a code sample, or unclear because you don't know anything about `numpy` `save` and `load` methods? The only part of the question that is unclear to me is the 'load some dimension' phrase, which could mean several things. – hpaulj Dec 31 '15 at 16:38
  • Correct Answer to this question: use hd5f. To save the data and partial liafing is possible.. –  Jan 01 '16 at 08:02
  • @Jim: Question is clear, but you seems not understanding the need to load partially some data. –  Jan 01 '16 at 08:12
  • 1
    @quantCode I never voted it as unclear. I merely gave a suggestion that you should add what you have already tried. I will be voting to re-open this question, when and if that happens, take the answer you edited in the question and post it as an answer. – Dimitris Fasarakis Hilliard Jan 01 '16 at 16:52
  • Of course, I tried looking into numpy functonnalities. As there is no functionnalities, this is not a bug: we cannot post code... Answer posted is one possible way, which reasonable code writing, speed level...(Other solutions might exist). I dont understand why this post is closed.... since the question is very useful one when dealing large data. –  Jan 06 '16 at 04:05

1 Answers1

7

You'd have to intentionally save the array for partial loading; you can't do generically.

You could, for example, split the array (along one of the dimensions) and save the subarrays with savez. load of a such a file archive is 'lazy', only reading the subfiles you ask for.

h5py is an add on package that saves and loads data from HDF5 files. That allows for partial reads.

numpy.memmap is another option, treating a file as memory that stores an array.

Look up the docs for these, as well as previous SO questions.

How can I efficiently read and write files that are too large to fit in memory?

Fastest save and load options for a numpy array

Writing a large hdf5 dataset using h5py


To elaborate on the holds. There are minor points that aren't clear. What exactly do you mean by 'load some dimension'? The simplest interpretation is that you want A[0,...] or A[3:10,...]. The other is the implication of 'simple way'. Does that mean you already have a complex way, and what a simpler one? Or just that you don't want to rewrite the numpy.load function to do the task?

Otherwise I think the question is reasonably clear - and the simple answer is - no there isn't a simple way.

I'm tempted to reopen the question so other experienced numpy posters can weigh in.


I should have reviewed the load docs (the OP should have as well!). As ali_m commented there is a memory map mode. The docs say:

mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional

   If not None, then memory-map the file, using the given mode
    (see `numpy.memmap` for a detailed description of the modes).
    A memory-mapped array is kept on disk. However, it can be accessed
    and sliced like any ndarray.  Memory mapping is especially useful for
    accessing small fragments of large files without reading the entire
    file into memory.

How does numpy handle mmap's over npz files? (I dug into this months ago, but forgot the option.)

Python memory mapping

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 2
    You can definitely do a partial read of a "generic" saved array, for example by passing the `mmap_mode=` parameter to `np.load` – ali_m Dec 31 '15 at 17:04
  • Hello, thank you, I will look into memmap or mmap , it looks promising to access data without loading 5go in memeory.... –  Jan 01 '16 at 03:46
  • I find the question and the answers useful. There seem to be (at least) two different solutions mentioned - manually "saving in chunks", and downloading and installing the [h5py](https://pypi.python.org/pypi/h5py) package. I think reopening would be fine. – uhoh Jan 02 '16 at 02:28