0

I am implementing a function which involves operations on numpy arrays and I am getting Memory Error on it. I am explicitly stating the dimensions of the numpy array which are creating the issue.

a = np.random.rand(15239,1)
b = np.random.rand(1,329960)
c  = np.subtract(a,b)**2
d = np.random.rand(15239,1)
e = np.random.rand(1,329960)
del a
gc.collect()
f = np.subtract(d,e)**2
del d
gc.collect()
g = np.sqrt(c + f).min(axis=0)
del c,f
gc.collect()

I am getting Memory Error on running them.

Though, the function which is using them is given below-

def make_weight_map(masks):
    """
    Generate the weight maps as specified in the UNet paper
    for a set of binary masks.

    Parameters
    ----------
    masks: array-like
        A 3D array of shape (n_masks, image_height, image_width),
        where each slice of the matrix along the 0th axis represents one binary mask.

    Returns
    -------
    array-like
        A 2D array of shape (image_height, image_width)

    """
    masks = masks.numpy()
    nrows, ncols = masks.shape[1:]
    masks = (masks > 0).astype(int)
    distMap = np.zeros((nrows * ncols, masks.shape[0]))
    X1, Y1 = np.meshgrid(np.arange(nrows), np.arange(ncols))
    X1, Y1 = np.c_[X1.ravel(), Y1.ravel()].T
    for i, mask in enumerate(masks):
        # find the boundary of each mask,
        # compute the distance of each pixel from this boundary
        bounds = find_boundaries(mask, mode='inner')
        X2, Y2 = np.nonzero(bounds)
        xSum = (X2.reshape(-1, 1) - X1.reshape(1, -1)) ** 2
        ySum = (Y2.reshape(-1, 1) - Y1.reshape(1, -1)) ** 2
        distMap[:, i] = np.sqrt(xSum + ySum).min(axis=0)
    ix = np.arange(distMap.shape[0])
    if distMap.shape[1] == 1:
        d1 = distMap.ravel()
        border_loss_map = w0 * np.exp((-1 * (d1) ** 2) / (2 * (sigma ** 2)))
    else:
        if distMap.shape[1] == 2:
            d1_ix, d2_ix = np.argpartition(distMap, 1, axis=1)[:, :2].T
        else:
            d1_ix, d2_ix = np.argpartition(distMap, 2, axis=1)[:, :2].T
        d1 = distMap[ix, d1_ix]
        d2 = distMap[ix, d2_ix]
        border_loss_map = w0 * np.exp((-1 * (d1 + d2) ** 2) / (2 * (sigma ** 2)))
    xBLoss = np.zeros((nrows, ncols))
    xBLoss[X1, Y1] = border_loss_map
    # class weight map
    loss = np.zeros((nrows, ncols))
    w_1 = 1 - masks.sum() / loss.size
    w_0 = 1 - w_1
    loss[masks.sum(0) == 1] = w_1
    loss[masks.sum(0) == 0] = w_0
    ZZ = xBLoss + loss
    return ZZ

Traceback of the error when used in function is below- I am using the system of 32 GB RAM, I also tested the code on 61 GB RAM-

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-32-0f30ef7dc24d> in <module>
----> 1 img = make_weight_map(img)

<ipython-input-31-e75a6281476f> in make_weight_map(masks)
     34         xSum = (X2.reshape(-1, 1) - X1.reshape(1, -1)) ** 2
     35         ySum = (Y2.reshape(-1, 1) - Y1.reshape(1, -1)) ** 2
---> 36         distMap[:, i] = np.sqrt(xSum + ySum).min(axis=0)
     37     ix = np.arange(distMap.shape[0])
     38     if distMap.shape[1] == 1:

MemoryError:

I have checked the below question but couldn't find the solution to my problem-
Python/Numpy Memory Error
Memory growth with broadcast operations in NumPy

This is another question with Memmap approach but I don't know how to apply in my use case.

Beginner
  • 721
  • 11
  • 27
  • `c` is large, `(15239,329960)`. Verify that. So is `f`. `c+f` produces another array of that size. And `sqrt` another. The result of `min` is smaller, like `b`, though I don't if internally it has to make a temporary large array or not. – hpaulj Oct 07 '19 at 16:22
  • Hi, Yes,c and f have a shape of `(15239,329960)` and `c+f` also do – Beginner Oct 07 '19 at 16:28

1 Answers1

1

No mystery, these are really large arrays. At 64-bit precision, an array of shape (15239,329960) needs...

>>> np.product((15239,329960)) * 8 / 2**30
37.46345967054367

...about 37GiB! Things to try:

  • Reduce the bit-depth, e.g. use np.float16, requiring 25% of the memory.
  • Is the data actually dense, or can you use scipy.sparse?
  • Maybe it's time for dask?
  • Get more RAM!
Matt Hall
  • 7,614
  • 1
  • 23
  • 36
  • 1
    Getting `Overflow occurred in squared` when using `float16, Yes, the data is dense. I guess, I have to try using dask now – Beginner Oct 07 '19 at 18:02
  • Can you please have a look at this problem, I am having while converting to dask array.https://stackoverflow.com/questions/58277168/dask-implementation-for-mutation-operation – Beginner Oct 07 '19 at 22:29