I am trying to calculate a mean value across large numpy array. Originally, I tried:
data = (np.ones((10**6, 133))
for _ in range(100))
np.stack(data).mean(axis=0)
but I was getting
numpy.core._exceptions.MemoryError: Unable to allocate xxx GiB for an array with shape (100, 1000000, 133) and data type float32
In the original code data is a generator of more meaningful vectors.
I thought about using dask for such an operation, hoping it will split my data into chunks backed by disk.
import dask.array as da
import numpy as np
data = (np.ones((10**6, 133)) for _ in range(100))
x = da.stack(da.from_array(arr, chunks="auto") for arr in data)
x = da.mean(x, axis=0)
y = x.compute()
However, when I run it, the process terminates with "Killed".
How can I resolve this issue on a single machine?