-1

I'm looking to count the number of False values that precede at least one True value for each row in a 2D Numpy array.

For example the desired function would act as below:

>>> x = np.array([[0,0,1,0],[1,1,0,1],[1,1,1,1]], dtype='bool')
>>> x
array([[False, False,  True, False],
       [ True,  True, False,  True],
       [ True,  True,  True,  True]])
>>> foo(x)
3

The result is 3 since the first row has two False before a True value, with the final False ignored. There's one False that precedes True on the second row and none on the third.

Rows with no True value (all False) count as 0.

Is there a vectorised means of calculating this without iterating through the rows? Currently the only ways I have would be counting transitions with numpy.diff() or iterating through each row individually.

benjscho
  • 13
  • 5

4 Answers4

0
import numpy as np
x = np.array([[0,0,1,0],[1,1,0,1],[1,1,1,1]], dtype='bool')
# reverse x:
idx = np.arange(x.shape[1]-1, -1, -1)
x_reverse = x[:, idx]
# find the number of Falses that come after the last true
x_int = x_reverse.astype(int)
x_sparse = (x_int==np.cumsum(x_int, axis = 1))*x_int
rows, falses_after = np.where(x_sparse,)
# find the total number of Falses
falses_sum = (1-x).sum(1)
#subtract the number of falses that come after the last true from the total number of falses
falses_before = falses_sum-falses_after

Output:

falses_before = [2 1 0]

Edit: I didn't realize you were looking for the sum of Falses that come before Trues. you can get this number by doing falses_before.sum()

yann ziselman
  • 1,952
  • 5
  • 21
  • Thanks, this is great! – benjscho Jun 22 '21 at 13:41
  • Realised an issue with this on testing, empty rows have values added. E.g., `array([[False, False, False], [False, True, False],[False, False, False]])` should return only 1, but returns 5 instead due to the empty rows. Partly my fault for not clarifying empty row behaviour! Rows of all False should sum to 0 as they do not precede a True value. – benjscho Jun 22 '21 at 14:18
0

You can use this snippet. np.where(i == True) is looking for index of item in array, and then we use j[0]+1 to get our first occurrence and add 1, because arrays are numbered from 0

import numpy as np

x = np.array([[0,0,1,0],[1,1,0,1],[1,1,1,1]], dtype='bool')

for i in x:
    j, = np.where(i == True)
    print(j[0]+1)
0

This can be solved by finding the last item on each row, and using the indexes to create a mask. The mask is then used to count the instances of False.

The last_nonzero function is taken from this post.

def last_nonzero(arr: np.ndarray, axis=0, invalid_val=-1):
    """
    Attribution: https://stackoverflow.com/a/47269413/14354978
    """
    mask = arr != False
    val = arr.shape[axis] - np.flip(mask, axis=axis).argmax(axis=axis) - 1
    return np.where(mask.any(axis=axis), val, invalid_val)

def count_false_pre_true(arr: np.ndarray):
    mask = np.zeros(arr.shape, dtype='bool')
    up_to = last_nonzero(arr, axis=1, invalid_val=0)
    for i, m in enumerate(up_to):
        mask[i, :m] = True
    return (arr[mask] == False).sum()

The enumerate loop to edit the mask is the only part that needs to be vectorised, but I can't figure out how to do that, any comments would be appreciated!

This works only with 2D numpy arrays, so 1D arrays need to be wrapped again.

benjscho
  • 13
  • 5
0

As a oneliner:

import numpy as np

x = np.array([[0,0,1,0],[1,1,0,1],[1,1,1,1]], dtype='bool')
np.max((~x).cumsum(axis = 1) * x, axis = 1 )

# array([2, 1, 0])

To show the calculation:

not_x = ~x    # Reverse the Trues and Falses
temp = not_x.cumsum( axis = 1 )  # cumsum the Falses by row.
temp

# array([[1, 2, 2, 3],
#        [0, 0, 1, 1],
#        [0, 0, 0, 0]])

temp *= x    # Zero the result only where x is False
temp 
# array([[0, 0, 2, 0],
#        [0, 0, 0, 1],
#        [0, 0, 0, 0]])     

temp.max( axis = 1 )  # Find the max in each row.
# array([2, 1, 0])
Tls Chris
  • 3,564
  • 1
  • 9
  • 24