Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

Question

I have an numpy array like this:

a = np.array([[1, 0, 1, 1, 1],
              [1, 1, 1, 1, 0],
              [1, 0, 0, 1, 1],
              [1, 0, 1, 0, 1]])

Question 1: As shown in the title, I want to replace all elements with zero after the first zero appeared. The result should be like this :

a = np.array([[1, 0, 0, 0, 0],
              [1, 1, 1, 1, 0],
              [1, 0, 0, 0, 0],
              [1, 0, 0, 0, 0]])

Question 2: how to slice different columns for each row like this example? As I am dealing with an array with large size. If any one could find an efficient way to solve this please. Thank you very much.

score 3 · Accepted Answer · answered May 29 '20 at 00:20

3

One way to accomplish question 1 is to use numpy.cumprod

>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
       [1, 1, 1, 1, 0],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0]])

answered May 29 '20 at 00:20

V. Ayrat

2,499
9
10

not sure about speed/efficiency of this, but it's beautiful, great insight. – BernieFeynman May 29 '20 at 01:17

score 0 · Answer 2 · answered May 28 '20 at 23:44

Question 1: You could iterate over the array like so:

for i in range(a.shape[0]):
    j = 0
    row = a[i]
    while row[j]>0:
        j += 1
    row[j+1:] = 0

This will change the array in-place. If you are interested in very high performance, the answers to this question could be of use to find the first zero faster. np.where scans the entire array for this and therefore is not optimal for the task. Actually, the fastest solution will depend a bit on the distribution of your array entries: If there are many floats in there and rarely is there ever a zero, the while loops in the code above will interrupt late on average, requiring to write only "a few" zeros. If however there are only two possible entries like in your sample array and these occur with a similar probability (i.e. ~50%), there would be a lot of zeros to be written to a, and the following will be faster:

b = np.zeros(a.shape)
for i in range(a.shape[0]):
    j = 0
    a_row = a[i]
    b_row = b[i]
    while a_row[j]>0:
        b_row[j] = a_row[j]
        j += 1

Question 2: If you mean to slice each row individually on a similar criterion dealing with a first occurence of some kind, you could simply adapt this iteration pattern. If the criterion is more global (like finding the maximum of the row, for example) built-in methods like np.where exist that will be more efficient, but it probably would depend a bit on the criterion itself which choice is best.

score 0 · Answer 3 · answered May 29 '20 at 00:08

Question 1: An efficient way to do this would be the following.

import numpy as np

a = np.array([[1, 0, 1, 1, 1],
              [1, 1, 1, 1, 0],
              [1, 0, 0, 1, 1],
              [1, 0, 1, 0, 1]])

for row in a:
    zeros = np.where(row == 0)[0]
    if (len(zeros)):# Check if zero exists
        row[zeros[0]:] = 0

print(a)

Output:

[[1 0 0 0 0]
 [1 1 1 1 0]
 [1 0 0 0 0]
 [1 0 0 0 0]]

Question 2: Using the same array, for each row rowIdx, you can have a array of columns colIdxs that you want to extract from.

rowIdx = 2
colIdxs = [1, 3, 4]
print(a[rowIdx, colIdxs])

Output:

[0 1 1]

Ehsan · Answer 4 · 2020-05-29T01:06:18.730

I prefer Ayrat's creative answer for the first question, but if you need to slice different columns for different rows in large size, this could help you:

indexer = tuple(np.s_[i:a.shape[1]] for i in (a==0).argmax(axis=1))
for i,j in enumerate(indexer):
    a[i,j]=0

indexer:

(slice(1, 5, None), slice(4, 5, None), slice(1, 5, None), slice(1, 5, None))

or:

indexer = (a==0).argmax(axis=1)
for i in range(a.shape[0]):
    a[i,indexer[i]:]=0

indexer:

[1 4 1 1]

output:

[[1 0 0 0 0]
 [1 1 1 1 0]
 [1 0 0 0 0]
 [1 0 0 0 0]]

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

4 Answers4