Benchmarking results
Well so I have decided to compare the proposed solutions in a typical case scenario. The three approaches are:
@timeit
def gener_sol(a):
# My OP proposal
return next(idx for idx in zip((i for i, x in enumerate(a) if x), (a.size-i-1 for i, x in enumerate(a[::-1]) if x)))
@timeit
def nonzero_sol(a):
# kalzso's proposal
return np.nonzero(a)[0][[0, -1]]
@timeit
def argmax_sol(a):
# jan992's proposal
return np.argmax(a), a.size - np.argmax(a[::-1]) - 1
NOTE: @timeit
is a custom wrapper I have to measure functions execution time. Beyond the question whether it is perfect or not, it is the same for all of them so it should be a fair comparison.
And here goes the piece of code to execute them in a loop:
for rep in range(10000):
N = 2**20 # Array size
depth = N // 10 # max depth from the edges for first and last True
arr = np.random.randint(0, 2, N).astype(bool)
# First and last True values are randomly located at both ends
first = random.randint(0, depth)
arr[:first] = False
arr[first] = True
last = random.randint(0, depth)
arr[-last:] = False
arr[-last] = True
arg_res = argmax_sol(arr)
non_res = nonzero_sol(arr)
gen_res = gener_sol(arr)
# Ensure they all return the same
assert arg_res == tuple(non_res) == gen_res
And the results are the following:
N=2^20 (10000 runs)
argmax_sol....: 10000 runs, 6.38 s (638.01 μs/run)
nonzero_sol...: 10000 runs, 13.124 s (1.312 ms/run)
gener_sol.....: 10000 runs, 42.695 s (4.27 ms/run)
N=2^25 (100 runs)
argmax_sol....: 100 runs, 1.891 s (18.908 ms/run)
nonzero_sol...: 100 runs, 4.125 s (41.25 ms/run)
gener_sol.....: 100 runs, 13.799 s (137.99 ms/run)
So, in this scenario the argmax
solution is the clear winner, and the generator
the clear loser.
However, I noted that the depth
at which the first and last indices are located (up to N/10 in this case) can have a significant impact. A reasonable value in my case is depth=10000
(or even less). Therefore, if I repeat the experiment I obtain:
N=2^20 (10000 runs) depth = 10000
gener_sol.....: 100 runs, 39.967 ms (399.671 μs/run)
argmax_sol....: 100 runs, 58.851 ms (588.505 μs/run)
nonzero_sol...: 100 runs, 138.077 ms (1.381 ms/run)
As you see, in this case the generator
yields better results. And the improvement is even larger if depth
can be bound to even a smaller number.
See for instance:
N=2^20 (10000 runs) depth=1000
gener_sol.....: 10000 runs, 0.947 s (94.753 μs/run)
argmax_sol....: 10000 runs, 6.082 s (608.196 μs/run)
nonzero_sol...: 10000 runs, 14.282 s (1.428 ms/run)
N=2^25 (1000 runs) depth=1000
gener_sol.....: 1000 runs, 0.053 s (53.08 μs/run)
argmax_sol....: 1000 runs, 18.84 s (18.84 ms/run)
nonzero_sol...: 1000 runs, 44.058 s (44.058 ms/run)
wow, that's a dramatic improvement! ~6 and ~350 times faster than numpy's argmax
for N=2^20 and 2^25, respectively. So one should not underestimate the power of for loops/generators under appropriate circumstances.
Finally, and to leave a good word for kalszo's nonzero
solution, I do have found that there is a case in which it outperforms the others, and that is when the array is False
everywhere except in two locations (the ones to find).
Therefore, changing the line:
# arr = np.random.randint(0, 2, N).astype(bool)
arr = np.zeros(N).astype(bool)
N=2^20 depth=N//10
, only two True values
nonzero_sol...: 10000 runs, 1.612 s (161.186 μs/run)
argmax_sol....: 10000 runs, 6.482 s (648.178 μs/run)
gener_sol.....: 10000 runs, 44.625 s (4.463 ms/run)
and with shallower depths:
N=2^20 (10000 runs) depth=10000
, only two True values
nonzero_sol...: 10000 runs, 1.571 s (157.093 μs/run)
gener_sol.....: 10000 runs, 4.208 s (420.791 μs/run)
argmax_sol....: 10000 runs, 6.235 s (623.512 μs/run)
N=2^25 (1000 runs) depth=10000
, only two True values
gener_sol.....: 1000 runs, 0.433 s (433.125 μs/run)
nonzero_sol...: 1000 runs, 5.427 s (5.427 ms/run)
argmax_sol....: 1000 runs, 19.128 s (19.128 ms/run)
N=2^20 (10000 runs) depth=1000
, only two True values
gener_sol.....: 10000 runs, 0.962 s (96.286 μs/run)
nonzero_sol...: 10000 runs, 1.637 s (163.687 μs/run)
argmax_sol....: 10000 runs, 6.522 s (652.228 μs/run)
again, the small depth relative to the array size is a key factor in favor of the generator expression.
In conclusion, the right solution depends very much on the scenario. For my particular case in which:
- Arrays are large (2^20-2^25)
- The array has multiple True values at unknown locations
- The first and last True values are relatively close to the edges
the generator expression has the best performance with up to one or two orders of magnitude.