The convolution approach suggested in scleronomic's answer is very promising, especially if you're looking for more than two consecutive elements.
However, the implementation presented in that answer might not be the most efficient, because it consists of two steps: diff()
followed by convolve()
.
Alternative implementation
If we consider that the diff()
can also be calculated using convolution, we can combine the two steps into a single convolution.
The following alternative implementation only requires a single convolution of the full signal, which is advantageous if the signal has many elements.
Note that we cannot take the absolute values of the diff
(to prevent false positives, as mentioned in this comment), so we add some random noise to the unit kernel instead.
# example signal
signal = numpy.array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0])
# minimum number of consecutive elements
n_consecutive = 3
# convolution kernel for weighted moving sum (with small random component)
rng = numpy.random.default_rng()
random_kernel = 1 + 0.01 * rng.random(n_consecutive - 1)
# convolution kernel for first-order difference (similar to numpy.diff)
diff_kernel = [1, -1]
# combine the kernels so we only need to do one convolution with the signal
combined_kernel = numpy.convolve(diff_kernel, random_kernel, mode='full')
# convolve the signal to get the moving weighted sum of differences
moving_sum_of_diffs = numpy.convolve(signal, combined_kernel, mode='valid')
# check if moving sum is zero anywhere
result = numpy.any(moving_sum_of_diffs == 0)
See the DSP guide for a detailed discussion of convolution.
Timing
The difference between the two implementations boils down to this:
def original(signal, unit_kernel):
return numpy.convolve(numpy.abs(numpy.diff(signal)), unit_kernel, mode='valid')
def alternative(signal, combined_kernel):
return numpy.convolve(signal, combined_kernel, mode='valid')
where unit_kernel = numpy.ones(n_consecutive - 1)
and combined_kernel
is defined above.
Comparison of these two functions, using timeit
, shows that alternative()
can be several times faster, for small kernel sizes (i.e. small value of n_consecutive
). However, for large kernel sizes the advantage becomes negligible, because the convolution becomes dominant (compared to the diff
).
Notes:
- For large kernel sizes I would prefer the original two-step approach, as I think it is easier to understand.
- Due to numerical issues it may be necessary to replace
numpy.any(moving_sum_of_diffs == 0)
by numpy.any(numpy.abs(moving_sum_of_diffs) < very_small_number)
, see e.g. here.