Memory usage increases when building a large NumPy array

Question

In the following example, I initialize a 500x2000x2000 three-dimensional NumPy array named a. At each iteration, a random two-dimensional array r is inserted into array a. This example represents a larger piece of code where the r array would be created from various calculations during each iteration of the for-loop. Consequently, each slice in the z dimension of the a array is calculated at each iteration.

# ex1_basic.py

import numpy as np
import time

def main():
    tic = time.perf_counter()

    z = 500     # depth
    x = 2000    # rows
    y = 2000    # columns

    a = np.zeros((z, x, y))

    for i in range(z):
        r = np.random.rand(x, y)
        a[i] = r

    toc = time.perf_counter()
    print('elapsed =', round(toc - tic, 2), 'sec')

if __name__ == '__main__':
    main()

This example's memory is profiled using the memory-profiler package. The steps for running the memory-profiler for this example are:

# Run the memory profiler
$ mprof run ex1_basic.py

# Plot the memory profiler results
$ mprof plot

The memory usage is plotted below. The memory usage increases over time as the values are added to the array.

I profiled another example where the data type for the array is defined as np.float32. See below for the example code and memory usage plot. This decreased the overall memory use but the memory still increases with each iteration.

import numpy as np
import time

def main():

    rng = np.random.default_rng()

    tic = time.perf_counter()

    z = 500     # depth
    x = 2000    # rows
    y = 2000    # columns

    a = np.zeros((z, x, y), dtype=np.float32)

    for i in range(z):
        r = rng.standard_normal((x, y), dtype=np.float32)
        a[i] = r

    toc = time.perf_counter()
    print('elapsed =', round(toc - tic, 2), 'sec')

if __name__ == '__main__':
    main()

Since I initialized array a using np.zeros, I would expect the memory usage to remain constant based on the block of memory initialized for that array. But it appears that memory usage increases as values are inserted into array a.

So I have two questions related to these examples:

Why does the memory usage increase with time?
How do I create and store array a on disk and only have slice a[i] and the r array in memory at each iteration? Basically, how would I run these examples if the a array did not fit in memory (RAM)?

Update

I ran an example using numpy.memmap but there is no improvement in memory usage. It seems like memmap is still keeping the entire array in memory.

import numpy as np
import time

def main():

    rng = np.random.default_rng()

    tic = time.perf_counter()

    z = 500
    x = 2000
    y = 2000

    a = np.memmap('file.dat', dtype=np.float32, mode='w+', shape=(z, x, y))

    for i in range(z):
        r = rng.standard_normal((x, y), dtype=np.float32)
        a[i] = r

    toc = time.perf_counter()
    print('elapsed =', round(toc - tic, 2), 'sec')

if __name__ == '__main__':
    main()

This has to do with optimization of sparse, big tensors. As you define values, memory is lazily allocated: https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space — rafaelc, Jun 03 '22 at 19:54
@rafaelc Ok, that makes sense. So is there a way to keep the array slice `a[i]` and `r` in memory and have the rest of array `a` on disk? Maybe [numpy.memmap](https://numpy.org/doc/stable/reference/generated/numpy.memmap.html) can be used. — wigging, Jun 03 '22 at 20:01
Yes, `memmap` is a good choice. There are some colabs and other posts on Google with sample codes (e.g. https://colab.research.google.com/github/timeseriesAI/tsai/blob/master/tutorial_nbs/00_How_to_efficiently_work_with_very_large_numpy_arrays.ipynb) . — rafaelc, Jun 03 '22 at 20:07
@hpaulj Using `a = np.ones((z, x, y))` reaches peak memory usage sooner but is still the same as the peak memory used by the `np.zeros` approach. — wigging, Jun 03 '22 at 20:08
As an aside, what is the point of looping here? why not `a = np.random.rand(x, y, z)`??? or `rng.standard_normal((x, y, z), dtype=np.float32)` — juanpa.arrivillaga, Jun 03 '22 at 20:35
@juanpa.arrivillaga Using `a = np.random.rand(x, y, z)` force data to be allocated and filled in a raw. This is not a problem in the first case (in fact, it is better indeed). However, it is when `memmap` is used in the second case (since the array will be allocated in RAM and the point of `memmap` is not to use the RAM here) — Jérôme Richard, Jun 03 '22 at 20:42
@hpaulj I have not tried the `flush()` function. Also, I updated my question to hopefully explain things better. — wigging, Jun 03 '22 at 23:56
memmap releases its memory only when there is not enough free memory in the system. If there is enough memory in the system, it consumes as much memory as a regular array. If the system runs out of memory, it will automatically release its memory. So it should be sufficient if your goal is to prevent out-of-memory. If you want to keep memory usage low, you need to periodically (e.g. `i%10 == 0` ) run `del a` and reopen it. — ken, Jun 04 '22 at 07:45
As already mentioned h5py could be an option. To get the best possible performance you have to set up chunk_shape and chunk_cache. https://stackoverflow.com/a/48405220/4045774 Another (depending on the read write pattern and compression ratio) would be to create a custom object eg. based on this: https://stackoverflow.com/a/56761075/4045774 This would be quite easy to implement if you only iterate over the first axis. — max9111, Jun 08 '22 at 11:58
@max9111 Can you submit an answer that demonstrates chunk_shape, chunk_cache, and custom object using h5py? You can compare the elapsed time and storage size to the example in the current answer. — wigging, Jun 08 '22 at 13:02

score 1 · Accepted Answer · answered Jun 04 '22 at 18:36

Using the h5py package, I can create an hdf5 file that contains a dataset that represents the a array. The dset variable is similar to the a variable discussed in the question. This allows the array to reside on disk, not in memory. The generated hdf5 file is 8 GB on disk which is the size of the array containing np.float32 values. The elapsed time for this approach is similar to the examples discussed in the question; therefore, writing to the hdf5 file seems to have a negligible performance impact.

import numpy as np
import h5py
import time

def main():

    rng = np.random.default_rng()

    tic = time.perf_counter()

    z = 500   # depth
    x = 2000  # rows
    y = 2000  # columns

    f = h5py.File('file.hdf5', 'w')
    dset = f.create_dataset('data', shape=(z, x, y), dtype=np.float32)

    for i in range(z):
        r = rng.standard_normal((x, y), dtype=np.float32)
        dset[i, :, :] = r

    toc = time.perf_counter()
    print('elapsed time =', round(toc - tic, 2), 'sec')

    s = np.float32().nbytes * (z * x * y) / 1e9  # where 1 GB = 1000 MB
    print('calculated storage =', s, 'GB')

if __name__ == '__main__':
    main()

Output from running this example on a MacBook Pro with 2.6 GHz 6-Core Intel Core i7 and 32 GB of RAM:

elapsed time = 22.97 sec
calculated storage = 8.0 GB

Running the memory profiler for this example gives the plot shown below. The peak memory usage is about 100 MiB which is drastically lower than the examples demonstrated in the question.

Memory usage increases when building a large NumPy array

Update

1 Answers1