1

I have a numpy array

array = np.array([5,100,100,100,5,5,100,100,100,5])

I create a mask with boolean indexing like so:

mask = (array < 30)

This gives a mask like

[ True False False False  True  True False False False  True]

I can get the indices of the True values in the mask with

indices = np.where(mask)[0]

This gives

[0 4 5 9]

For every True value in the mask, I would like to modify the next 2 elements to also be True.

I can do this with a for loop like so:

for i in indices:
    mask[i:i+3] = True

Is there a more numpythonic approach to this without using a for loop?

Desired mask output:

[ True  True  True False  True  True  True  True False  True]

The main priority here is performance.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Craig Nathan
  • 197
  • 10

1 Answers1

3

You can use np.flatnonzero to simplify the getting of indices. Then you can add np.arange(3) to each one:

ind = np.flatnonzero(mask)[:, None] + np.arange(3)

The only caveat is that your index may contain a couple of out-of-bounds elements. You can trim them with a mask or np.clip:

ind[ind >= mask.size] = mask.size - 1

You can then apply the index directly, since numpy allows arbitrary dimensions for fancy indices:

mask[ind] = True

If you have a small smear to do, you can smear the mask directly:

mask[1:] |= mask[:-1]
mask[1:] |= mask[:-1]

You'll obviously have to put this in a loop if the smear amount is arbitrary, but you can optimize it by stepping in powers of two.

I call the operation mask[1:] |= mask[:-1] smearing because it expands the size of any group of True elements to the right by one, as if you smeared the ink with your finger. To smear an arbitrary amount n:

s = 1
while s <= n:
    mask[s:] |= mask[:-s]
    s *= 2
s = n - s // 2
if s:
    mask[s:] |= mask[:-s]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Thanks for your response. I'm getting a "IndexError: index 10 is out of bounds for axis 0 with size 10" on the "mask[ind] = True" line – Craig Nathan Apr 13 '21 at 16:44
  • Right. I forgot about that. Hang on – Mad Physicist Apr 13 '21 at 16:50
  • @CraigNathan. Fixed. Use the second option (mask smear) – Mad Physicist Apr 13 '21 at 16:52
  • Thank you. Would you mind explaining what smearing is and how I might set up a loop for it? What would I use as my iterator? – Craig Nathan Apr 13 '21 at 16:59
  • Nevermind, I got it. Thanks! Marking this as the answer, but ideally I would like to avoid using a for loop if you have any suggestions for that. This is an example array, my project has a much larger dataset and takes hours to run when using the full dataset so I'm trying to shave off time everywhere I can. – Craig Nathan Apr 13 '21 at 17:08
  • Updated the answer – Mad Physicist Apr 13 '21 at 17:20
  • @CraigNathan. If you have a small amount of smear, you can hard code it, no loop necessary. If it's not so small, use the indexing approach. It's more memory intensive, but very very fast – Mad Physicist Apr 13 '21 at 17:22
  • Great, thank you. Can you explain what adding np.arange(3) does? If I take it out it doesn't seem to affect the output. Do I even need to keep track of the indices since the smearing doesn't use them? – Craig Nathan Apr 13 '21 at 17:39
  • @CraigNathan. If you inspect the result of that statement, you'll see immediately. It adds a second dimension that contains `i`, `i+1`, `i+2` for each index `i`. The reason I break the code into bits like that is so that you have an easier time inspecting and playing with them. – Mad Physicist Apr 13 '21 at 17:43
  • Thank you for your help! – Craig Nathan Apr 13 '21 at 18:45