Consider the following simple test:
import numpy as np
from timeit import timeit
a = np.random.randint(0,2,1000000,bool)
Let us find the index of the first True
timeit(lambda:a.argmax(), number=1000)
# 0.000451055821031332
This is reasonably fast because numpy
short-circuits.
It also works on contiguous slices,
timeit(lambda:a[1:-1].argmax(), number=1000)
# 0.0006490410305559635
But not, it seems, on non-contiguous ones. I was mainly interested in finding the last True
:
timeit(lambda:a[::-1].argmax(), number=1000)
# 0.3737605109345168
UPDATE: My assumption that the observed slowdown was due to not short circuiting is inaccurate (thanks @Victor Ruiz). Indeed, in the worst-case scenario of an all
False
array
b=np.zeros_like(a)
timeit(lambda:b.argmax(), number=1000)
# 0.04321779008023441
we are still an order of magnitude faster than in the non-contiguous case. I'm ready to accept Victor's explanation that the actual culprit is a copy being made (timings of forcing a copy with
.copy()
are suggestive). Afterwards it doesn't really matter anymore whether short-circuiting happens or not.
But other step sizes != 1 yield similar behavior.
timeit(lambda:a[::2].argmax(), number=1000)
# 0.19192566303536296
Question: Why does numpy
not short-circuit UPDATE without making a copy in the last two examples?
And, more importantly: Is there a workaround, i.e. some way to force numpy
to short-ciruit UPDATE without making a copy also on non-contiguous arrays?