I developed a numpy
-only version that works, but after testing, I found that it performs quite poorly because it can't take advantage of short-circuiting. Since that's what you asked for, I describe it below. However, there is a much better approach using numba
with a lightly modified version of your code. (Note that all of these return the index of the first match in a
, rather than the value itself. I find that approach more flexible.)
@numba.jit(nopython=True)
def find_reps_numba(a, max_len):
streak = 1
val = a[0]
for i in range(1, len(a)):
if a[i] == val:
streak += 1
if streak >= max_len:
return i - max_len + 1
else:
streak = 1
val = a[i]
return -1
This turns out to be ~100x faster than the pure Python version.
The numpy
version uses the rolling window trick and the argmax trick. But again, this turns out to be far slower than even the pure Python version, by a substantial ~30x.
def rolling_window(a, window):
a = numpy.ascontiguousarray(a) # This approach requires a C-ordered array
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def find_reps_numpy(a, max_len):
windows = rolling_window(a, max_len)
return (windows == windows[:, 0:1]).sum(axis=1).argmax()
I tested both of these against a non-jitted version of the first function. (I used Jupyter's %%timeit
feature for testing.)
a = numpy.random.randint(0, 100, 1000000)
%%timeit
find_reps_numpy(a, 3)
28.6 ms ± 553 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
find_reps_orig(a, 3)
4.04 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
find_reps_numba(a, 3)
8.29 µs ± 89.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that these numbers can vary dramatically depending on how deep into a
the functions have to search. For a better estimate of expected performance, we can regenerate a new set of random numbers each time, but it's difficult to do so without including that step in the timing. So for comparison here, I include the time required to generate the random array without running anything else:
a = numpy.random.randint(0, 100, 1000000)
9.91 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
a = numpy.random.randint(0, 100, 1000000)
find_reps_numpy(a, 3)
38.2 ms ± 453 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
a = numpy.random.randint(0, 100, 1000000)
find_reps_orig(a, 3)
13.7 ms ± 404 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
a = numpy.random.randint(0, 100, 1000000)
find_reps_numba(a, 3)
9.87 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you can see, find_reps_numba
is so fast that the variance in the time it takes to run numpy.random.randint(0, 100, 1000000)
is much larger — hence the illusory speedup between the first and last tests.
So the big moral of the story is that numpy
solutions aren't always best. Sometimes even pure Python is faster. In those cases, numba
in nopython
mode may be the best option by far.