-1

I have a Numpy one-dimensional array of 1s and 0s. for e.g

a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])

I want to replace continuous 0s to 1s if the length of the continuous 0s is less than a threshold, let said 2. and the first and last continuous 0s would be excluded. So it would output a new array like this

out: [0,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,1,0,0]

if threshold is 4 the output would be

out: [0,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0]

What I do is counting each segments' length I got this solution from this answer

segLengs = np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))

out: [1,3,7,1,1,2,3,2,2]

Then find the segments which is less than the threshold

gaps = np.where(segLengs <= threshold)[0]
gapsNeedPadding = gaps[gaps % 2 == 0]

And then loop though gapsNeedPadding array

also itertools.groupby could do the job but it would be a little bit slow

Is there a more efficient solution? I would prefer vectorize solution. speed is what I need. I already got a slow solution which loop though the array

Update

Tried the solution provide from @divakar in this question, but it seems like it could not solve my problem when the threshold is larger.

numpy_binary_closing and binary_closing have different output. Also both function won't CLOSE from the boundaries + threshold

Did I make any mistake in the following code?

import numpy as np
from scipy.ndimage import binary_closing

def numpy_binary_closing(mask,threshold):

    # Define kernel
    K = np.ones(threshold)

    # Perform dilation and threshold at 1
    dil = np.convolve(mask, K, mode='same') >= 1

    # Perform erosion on the dilated mask array and threshold at given threshold
    dil_erd = np.convolve(dil, K, mode='same') >= threshold
    return dil_erd

threshold = 4
mask = np.random.rand(100) > 0.5

print(mask.astype(int))
out1 = numpy_binary_closing(mask, threshold)
out2 = binary_closing(mask, structure=np.ones(threshold))
print(out1.astype(int))
print(out2.astype(int))
print(np.allclose(out1,out2))

Outout

[0 1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 0 1 0 0 0 0 1 0 1 1 1]

[0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0]

[0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0]

False
Eric So
  • 465
  • 2
  • 14

1 Answers1

1

In the absence of any better idea:

for _ in range(threshold - 1):
    a |= np.roll(a, 1)

(This code does not take care of the trailing zeros.)

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • if threshold is 3 it outputs Out[7]: array([0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1]) Does not seem to be working? – Eric So Sep 04 '17 at 07:42