0

I have a (large) numpy array of boolean values, where the True indicate the zero-crossing condition of another previously processed signal. In my case, I am only interested in the indexes of the first and last True values in the array and therefore I'd rather not use np.where or np.argwhere in order to avoid going through the entire array unnecessarily.

I'd prefer not to convert my array into a list and use the index() method. Or should I?

So far, the best solution I have come up with is the following:

# Alias for compactness
enum = enumerate

# Example array, desired output = (3, -5))
a = np.array([False, False, False, True, ..., True, False, False, False, False])


out = next(idx for idx in zip((i for i, x in enum(a) if x), (-i-1 for i, x in enum(a[::-1]) if x)))

print(out)
# (3, -5)

But I still find it a bit bulky and wonder whether Numpy already implements such functionality with another syntax. Does it?

Martí
  • 571
  • 6
  • 17
  • "large" is relative. What is a typical size of your array? – Warren Weckesser Dec 21 '22 at 10:54
  • anywhere between 2^15 and 2^25 – Martí Dec 21 '22 at 11:21
  • And are there exactly two True values, and the rest is False? Or there can be more then two True values in there? – kalzso Dec 21 '22 at 13:59
  • you can use `out=(np.nonzero(a)[0][0],np.nonzero(a)[0][-1])` – phœnix Dec 21 '22 at 14:15
  • 2
    @phœnix that is exactly what my answer is. – kalzso Dec 21 '22 at 14:48
  • @kalzso there are as many True values as zero-crossings were in the signal previously processed, so we must assume there is an arbitrary number of True values, being the case of only 2 True values a valid but unlikely possibility. – Martí Dec 21 '22 at 15:16
  • @Martí Than that means you have to through the whole array, otherwise how can You know when did you find the last True value? – kalzso Dec 21 '22 at 18:01
  • @kalzso starting from the front one finds the first occurrence, starting from the end one finds the last occurrence, no need to go through all the array. And that is why I like the solution from jan992, because argmax will stop at the first occurrence found, so you only need to use once argmax from the front and once starting from the tail. – Martí Dec 22 '22 at 01:20

3 Answers3

2

Benchmarking results

Well so I have decided to compare the proposed solutions in a typical case scenario. The three approaches are:

@timeit
def gener_sol(a):
    # My OP proposal
    return next(idx for idx in zip((i for i, x in enumerate(a) if x), (a.size-i-1 for i, x in enumerate(a[::-1]) if x)))

@timeit
def nonzero_sol(a):
    # kalzso's proposal
    return np.nonzero(a)[0][[0, -1]]

@timeit
def argmax_sol(a):
    # jan992's proposal
    return np.argmax(a), a.size - np.argmax(a[::-1]) - 1

NOTE: @timeit is a custom wrapper I have to measure functions execution time. Beyond the question whether it is perfect or not, it is the same for all of them so it should be a fair comparison.

And here goes the piece of code to execute them in a loop:

for rep in range(10000):
    N = 2**20  # Array size
    depth = N // 10  # max depth from the edges for first and last True
    
    arr = np.random.randint(0, 2, N).astype(bool)

    # First and last True values are randomly located at both ends
    first = random.randint(0, depth)
    arr[:first] = False
    arr[first]  = True

    last = random.randint(0, depth)
    arr[-last:] = False
    arr[-last]  = True

    arg_res = argmax_sol(arr)
    non_res = nonzero_sol(arr)
    gen_res = gener_sol(arr)

    # Ensure they all return the same
    assert arg_res == tuple(non_res) == gen_res

And the results are the following:

N=2^20 (10000 runs)

argmax_sol....:   10000 runs,       6.38 s  (638.01 μs/run)
nonzero_sol...:   10000 runs,     13.124 s  (1.312 ms/run)
gener_sol.....:   10000 runs,     42.695 s  (4.27 ms/run)

N=2^25 (100 runs)

argmax_sol....:     100 runs,      1.891 s  (18.908 ms/run)
nonzero_sol...:     100 runs,      4.125 s  (41.25 ms/run)
gener_sol.....:     100 runs,     13.799 s  (137.99 ms/run)

So, in this scenario the argmax solution is the clear winner, and the generator the clear loser. However, I noted that the depth at which the first and last indices are located (up to N/10 in this case) can have a significant impact. A reasonable value in my case is depth=10000 (or even less). Therefore, if I repeat the experiment I obtain:

N=2^20 (10000 runs) depth = 10000

gener_sol.....:     100 runs,    39.967 ms  (399.671 μs/run)
argmax_sol....:     100 runs,    58.851 ms  (588.505 μs/run)
nonzero_sol...:     100 runs,   138.077 ms  (1.381 ms/run)

As you see, in this case the generator yields better results. And the improvement is even larger if depth can be bound to even a smaller number.

See for instance:

N=2^20 (10000 runs) depth=1000

gener_sol.....:   10000 runs,      0.947 s  (94.753 μs/run)
argmax_sol....:   10000 runs,      6.082 s  (608.196 μs/run)
nonzero_sol...:   10000 runs,     14.282 s  (1.428 ms/run)

N=2^25 (1000 runs) depth=1000

gener_sol.....:    1000 runs,      0.053 s  (53.08 μs/run)
argmax_sol....:    1000 runs,      18.84 s  (18.84 ms/run)
nonzero_sol...:    1000 runs,     44.058 s  (44.058 ms/run)

wow, that's a dramatic improvement! ~6 and ~350 times faster than numpy's argmax for N=2^20 and 2^25, respectively. So one should not underestimate the power of for loops/generators under appropriate circumstances.

Finally, and to leave a good word for kalszo's nonzero solution, I do have found that there is a case in which it outperforms the others, and that is when the array is False everywhere except in two locations (the ones to find).

Therefore, changing the line:

# arr = np.random.randint(0, 2, N).astype(bool)
arr = np.zeros(N).astype(bool)

N=2^20 depth=N//10, only two True values

nonzero_sol...:   10000 runs,      1.612 s  (161.186 μs/run)
argmax_sol....:   10000 runs,      6.482 s  (648.178 μs/run)
gener_sol.....:   10000 runs,     44.625 s  (4.463 ms/run)

and with shallower depths:

N=2^20 (10000 runs) depth=10000, only two True values

nonzero_sol...:   10000 runs,      1.571 s  (157.093 μs/run)
gener_sol.....:   10000 runs,      4.208 s  (420.791 μs/run)
argmax_sol....:   10000 runs,      6.235 s  (623.512 μs/run)

N=2^25 (1000 runs) depth=10000, only two True values

gener_sol.....:    1000 runs,      0.433 s  (433.125 μs/run)
nonzero_sol...:    1000 runs,      5.427 s  (5.427 ms/run)
argmax_sol....:    1000 runs,     19.128 s  (19.128 ms/run)

N=2^20 (10000 runs) depth=1000, only two True values

gener_sol.....:   10000 runs,      0.962 s  (96.286 μs/run)
nonzero_sol...:   10000 runs,      1.637 s  (163.687 μs/run)
argmax_sol....:   10000 runs,      6.522 s  (652.228 μs/run)

again, the small depth relative to the array size is a key factor in favor of the generator expression.

In conclusion, the right solution depends very much on the scenario. For my particular case in which:

  • Arrays are large (2^20-2^25)
  • The array has multiple True values at unknown locations
  • The first and last True values are relatively close to the edges

the generator expression has the best performance with up to one or two orders of magnitude.

Martí
  • 571
  • 6
  • 17
  • Nice summary, interesting to see the results for different scenarios. – kalzso Dec 22 '22 at 10:01
  • So which answer will you accept? Which method suits you better? – kalzso Dec 23 '22 at 11:05
  • Well I wish I could accept all answers ;) as I said, each one has it application case. In my case, the generator option is the one that shows best performance, so maybe I should accept my answer? also, it's the most informative answer with the benchmark and all :P – Martí Dec 24 '22 at 07:08
  • Could you please attach the script or link to a github of @timeit's code? Thank – Jred0n29 Dec 26 '22 at 14:45
  • @Jred0n29 sorry I cannot, it is code I developed within my company and therefore it is not free for me to disclose. However I can assure you it is nothing fancy, it's basically a wrapper that does ``tic=time.time()`` and ``dt = time.time()-tic`` before and after the function execution, respectively, and averaging over the number of executions. I'm sure it can be improved, and you can surely find dozens of implementations online. – Martí Dec 28 '22 at 01:50
1

How about this?

import numpy as np

a = np.array([False, False, False, True, False, True, False, False, False, False])

out = (np.argmax(a), a.size - np.argmax(a[::-1]) - 1)

np.argmax() stops at the first True value, so this might solve your problem of not wanting to iterate over the whole array? Inspired by this answer.

jan992
  • 11
  • 2
  • Yeah, I like it! I wonder if there is a difference in performance between using np.argmax or using the generator version (I guess it is compute in place vs compute on demand? I might be wrong) – Martí Dec 21 '22 at 11:28
  • I think if you vectorize your code with numpy to avoid loops, it will always run faster then with python loops or iterators. – kalzso Dec 21 '22 at 11:31
  • Yes @kalzso, I guess that it's true if the python loop iterator is performed over the entire array, but if the first and last ``True`` are relatively closer to the edges of the array it might end up being faster than a numpy operation on the entire array... – Martí Dec 21 '22 at 11:41
  • The algorith that computes the result takes the time. Once you have the result, you have the first and last element from the result in constant time. No need to use iterator for it. – kalzso Dec 21 '22 at 11:43
  • Exactly, and the question lies on which time is faster. I guess I'll have to benchmark that to see which solution happens to be faster (constant time, vs average time when my first and last ``True`` value vary in position but statistically are always near the same positions) – Martí Dec 21 '22 at 11:47
  • I also think the pure numpy version should be faster, but keen to hear about your benchmark :) – jan992 Dec 21 '22 at 13:13
  • so guys, jan992, @kalzso , I did find some free time and went for the benchmark ;) you can find the results in the additional answer to my OP. Please let me know if you have further insights into the topic! – Martí Dec 22 '22 at 03:01
  • Thanks for sharing your benchmark, very interesting to see! I'll keep that in mind next time I come across a similar issue :-). – jan992 Dec 22 '22 at 13:52
1

Use numpy's nonzero function, which returns the indices of all non-zero elements in the input array:

a = np.array([False, False, False, True, ..., True, False, False])

# Find the indices of the non-zero elements
indices = np.nonzero(a)

# The first and last indices will be the first and last True values
first = indices[0][0]
last = indices[0][-1]
kalzso
  • 502
  • 2
  • 6
  • 27