More efficient solution to find longest series based on boolean in NumPy ndArray

Question

I search my ndArray to find longest series based on True values. Is there an option to find longest series without looping through array?

I've already wrote my own solution with numpy.nonzero, but there is probably better one.

import numpy as np
arr = np.array([[[1,2,3,4,5],
                [6,7,8,9,10],
                [11,12,13,14,15],
                [16,17,18,19,20],
                [21,22,23,24,25]],
                [[True,True,True,False,True],
                [True,True,True,True,False],
                [True,True,False,True,True],
                [True,True,True,False,True],
                [True,True,True,False,True]]])

def getIndices(arr):
    arr_to_search = np.nonzero(arr)
    arrs = []
    prev_el0 = 0
    prev_el1 = -1
    activ_long = []
    for i in range(len(arr_to_search[0])):
        if arr_to_search[0][i] == prev_el0:
            if arr_to_search[1][i] != prev_el1 + 1:
                arrs.append(activ_long)
                activ_long = []
        else:
            arrs.append(activ_long)
            activ_long = []
        activ_long.append((arr_to_search[0][i],arr_to_search[1][i]))
        prev_el0 = arr_to_search[0][i]
        prev_el1 = arr_to_search[1][i]

    max_len = len(max(arrs,key=len))
    longest_arr_list = [a for a in arrs if len(a) == max_len]
    return longest_arr_list

print(getIndices(arr[1,:,:]))
print(getIndices(arr[1,:,:].T))


[[(1, 0), (1, 1), (1, 2), (1, 3)]]
[[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)], [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)]]

If your code works, might be more appropriate for code review — user3483203, Dec 30 '18 at 15:13
What do you mean with `find longest series based on True values` ? What is the expected output? — yatu, Dec 30 '18 at 15:14
By series i mean longest subarray where value is True, like in example below: [True,True,True,False,True] [True,True,True,True,False] [True,True,False,True,True] [True,True,True,False,True] [True,True,True,False,True] here, longest series is in col 0 and col 1, 5 x True, False is breaking series — grudzien, Dec 30 '18 at 15:25

score 0 · Accepted Answer · answered Dec 30 '18 at 16:21

Here is a numpy solution which avoids explicit loops based on this previous question.

I'm assuming the boolean array is named a. Essentially we find the indices for where rows change from 0 to 1 or from 1 to 0 and look at the difference between these. By padding 0s on the frong and back, we ensure that for every transition from a 0 to a 1, there is another transition from a 1 to a 0.

For convenience I process a and a.T at the same time, but you could do them separately if you want.

m,n = a.shape
A = np.zeros((2*m,n+2))
A[:m,1:-1] = a
A[m:,1:-1] = a.T

dA = np.diff(A)

start = np.where(dA>0)
end = np.where(dA<0)

argmax_run = np.argmax(end[1]-start[1])

row = start[0][argmax_run]
col_start = start[1][argmax_run]
col_end= end[1][argmax_run]-1

max_len = col_end - col_start + 1

print('max run of length {}'.format(max_len))
print('in '+('row' if row<m else'col')+' {}'.format(row%m)+' from '+('col' if row<m else'row')+' {} to {}'.format(col_start,col_end))

To improve performance and storage we can change A to a boolean array. Since the -1 and 1 in dA above always come in pairs, we can find start and end as below.

nz = np.nonzero(dA)
start = (nz[0][::2], nz[1][::2])
end = (nz[0][1::2], nz[1][1::2])

Note that you can then remove the variables start and end entirely as they aren't really needed.

More efficient solution to find longest series based on boolean in NumPy ndArray

1 Answers1

Linked