0

Say I have thousands of 2D numpy arrays (each have shape 600x600) saved in a text file. I would like to read the file pixel by pixel for each array and operate on a 1D array of these pixels, without having to load in the whole file, since this would use a lot of memory.

For example, if this was in my file:

array([[1, 42, 98, ..., 2], ..., [89, 10, 76, ..., 2]]), array([[36, 79, 13, ..., 11], [81, 101, 34, ..., 109]]), ...

I would then want (for the [0][0] position) [1, 36, ...], for [0][1] I would want [42, 79, ...] and so on. After I'm done operating on each 1D array, I'd like to delete it from memory and move on to reading the next one. Is this possible? It also doesn't have to be from a text file, if another type of file would work better.

curious_cosmo
  • 1,184
  • 1
  • 18
  • 36

1 Answers1

2

You can work with numpy memmap. Load the array as usual using np.load with the mmap_mode parameter set True. From the docs :

Create a memory-map to an array stored in a binary file on disk.

Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory. NumPy’s memmap’s are array-like objects.

Deepak Saini
  • 2,810
  • 1
  • 19
  • 26
  • Thank you, does this work if I dumped my arrays into a binary file via `pickle`, element by element? – curious_cosmo Aug 29 '18 at 17:40
  • You dump the arrays as binary on the disk. How to do that : https://stackoverflow.com/questions/13780907/is-it-possible-to-np-concatenate-memory-mapped-files. Then as mentioned in the ans, just load in memory mapped mode. – Deepak Saini Aug 29 '18 at 17:51
  • I need to dump the arrays to the file one by one, to conserve memory. Unfortunately `np.save` does not have an `append` mode, and if I use something else like `pickle` to dump one by one, I cannot seem to get `np.load` to work to use in `memmap`. Do you know of a way to handle this? – curious_cosmo Aug 29 '18 at 19:10
  • While `pickle` can save multiple arrays (using `np.save` as the array pickler), `unpickle` does not implement a `mmap_mode` as `load` does. With `save/load` the databuffer of an array is written to the file. `mmap_mode` allows `load` to read portions of that buffer. If you use `pickle` file contains the databuffers for multiple arrays (separated by header blocks). It isn't a memmap compatible file. – hpaulj Aug 30 '18 at 01:19