1

I have used np.load with mmap_mode = r to load "chunks" of a Numpy array into RAM without having to load the entire array (which would result in a memory error).

My question is how can I do the opposite, load a "chunk" from a small file and add it to a numpy array which is too big to load into RAM, and then save the new big numpy array?

This code is an example of what I have tried. I was thinking that it would not load the entire 'db' file but would just add to it and then save the additional "chunk":

for Num in range(0,10000):
  chunk = np.random.uniform(0,1,size=(100000,1000))
  if Num == 0:
    np.save('/content/drive/My Drive/Share/Daily Data/Database/db.npy', chunk)
  else:
    db = np.load('/content/drive/My Drive/Share/Daily Data/Database/db.npy', mmap_mode='r')
    db = np.vstack((db,chunk))
    np.save('/content/drive/My Drive/Share/Daily Data/Database/db.npy', db)
    del db

But that is not working and it seems that in the vstack line it does end up loading all of 'db' because each additional iteration takes longer and longer and then gives a memory error. How can I change this code so I can achieve the same result (adding a 'chunk' to the 'db') without loading the entire 'db' into RAM?

lara_toff
  • 413
  • 2
  • 14
  • Does this answer your question? [save numpy array in append mode](https://stackoverflow.com/questions/30376581/save-numpy-array-in-append-mode) – G. Anderson Sep 08 '20 at 17:01
  • Would prefer to stay away from HDF5 ... – lara_toff Sep 08 '20 at 17:06
  • `vstack` just calls `np.concatenate`. That makes a new array from the tuple of inputs. What else did you expect it to do? Concatenation is done in memory, not on any file. And the `np.save` is creating a whole new file. It isn't added to the one opened with the `np.load`. – hpaulj Sep 08 '20 at 18:03
  • 1
    OK so what do I do instead... – lara_toff Sep 08 '20 at 18:07

0 Answers0