Finding palindromic sequences of specified length among consecutive array elements - is a non-loopy solution possible?

Question

Consider a sample 1D integer numpy array a. Actual instances of a could have up to a million elements:

a = np.array([2, 6, 3, 2, 4, 5, 4, 2, 4, 7, 8, 7, 1])

I need to have a function pal(n) where argument n >= 2. The function is to identify n consecutive elements of a that form a palindromic sequence. It is sufficient to just identify the index of the first element.

examples: a) pal(5) = [3] because the 5 consecutive elements starting with index = 3 form the palindromic sequence (ie: 2,4,5,4,2)

b) pal(3) = [6,9] because the 3 elements starting at index 6 and at index 9 yield palindromic sequences (ie: 4,2,4 and 7,8,7)

This is fairly straight forward using loops, but I'm wondering if a non-loopy solution is possible.

how about using your vector to create a toeplitz matrix, inverting the vector and subtraction it from each row of the toeplitz matrix and then searching for rows with consecutive zeros using [one of the solutions in this post](https://stackoverflow.com/questions/24342047/count-consecutive-occurences-of-values-varying-in-length-in-a-numpy-array)? — yann ziselman, Aug 25 '21 at 13:37
What do you mean by non-loopy solution ? A solution that use in-built numpy function ? But if those in-built numpy function use a loop behind the hood, is it still a non-loopy solution ? Also a well-written for loop solution is likely to more performant than any other solution, why do you want to use another solution ? And last point, `pal(3)` should output `[4,6,9]` no ? — obchardon, Aug 25 '21 at 13:39
I don't quite understand your apparent aversion to using overt loops. As @obchardon has suggested, the "behind the scenes" solutions will be using loops anyway. I have code that will search for all 5-character numeric palindromes in a list containing one million single digit values in ~250ms. — , Aug 25 '21 at 14:23
@DarkKnight - Can you provide a solution with loops, because I think you might be right? I would like to compare the performance impact with my solution. — Michael Szczesny, Aug 25 '21 at 14:25

score 3 · Accepted Answer · 2021-08-26T06:39:52.437

3

This answer uses an overt loop and performs extremely well. Average duration is ~250ms (excluding construction of the pseudo-random list)

import random
from datetime import datetime

s = [random.randint(1, 9) for _ in range(1_000_000)]

def pal(n):
    r = []
    for i in range(len(s) - n + 1):
        a = s[i:i + n]
        if a == a[::-1]:
            r.append(i)
    return r


_start = datetime.now()
result = pal(5)
_end = datetime.now()
print(f'Duration={_end-_start}')

edited Aug 26 '21 at 06:39

answered Aug 25 '21 at 14:25

Very fast. Does the job well. – user109387 Aug 25 '21 at 20:04
I had to modify this line by adding 1 to the range: for i in range(len(s) - n + 1): – user109387 Aug 25 '21 at 20:38

Michael Szczesny · Answer 2 · 2021-08-25T15:25:09.063

You can use a sliding_window_view and check if the flipped view is identical. This solution is fully vectorized (non-loopy) but creates one additional array with size (len(a)-n,n).

import numpy as np

a = np.array([2, 6, 3, 2, 4, 5, 4, 2, 4, 7, 8, 7, 1])

def pal(arr, n):
    x = np.lib.stride_tricks.sliding_window_view(arr, n)
    return np.where(~(x != np.fliplr(x)).any(1))[0]

Your test cases:

pal(a,3)

Output

array([4, 6, 9])

pal(a,5)

Output

array([3])

Micro-Benchmark

Runtime benchmarks for the loop solution against the vectorized solution on a google colab instance in a jupyter notebook.

Results

vectorized with 100000
100 loops, best of 5: 3.74 ms per loop
vectorized with 1000000
10 loops, best of 5: 34.9 ms per loop
vectorized with 10000000
1 loop, best of 5: 343 ms per loop
loop with 100000
10 loops, best of 5: 33.3 ms per loop
loop with 1000000
1 loop, best of 5: 346 ms per loop
loop with 10000000
1 loop, best of 5: 3.42 s per loop

Code for the benchmark

import random
import numpy as  np

def pal_vec(arr, n):
    x = np.lib.stride_tricks.sliding_window_view(arr, n)
    return np.where(~(x != np.fliplr(x)).any(1))[0]

def pal_loop(n):
    r = []
    for i in range(len(s) - n):
        a = s[i:i + n]
        if a == a[::-1]:
            r.append(i)
    return r

#preparing the data
r = [5,6,7]
looparr = [[random.randint(1, 9) for _ in range(10**k)] for k in r]
nparr = [np.array(e) for e in looparr]


# benchmarking
for arr in nparr:
    print(f'vectorized with {len(arr)}')
    %timeit pal_vec(arr,3)
for s in looparr:
    print(f'loop with {len(s)}')
    %timeit pal_loop(3)

Good solution, but it will be interesting to compare the two algorithm for `n=50` for example, since the memory needed by the linearized solution will also increase linearly. — obchardon, Aug 25 '21 at 15:54
The sliding-window idea is one I can use in other places as well. When I was doing this manually, that's essentially the approach I used, without knowing it could be coded so compactly. — user109387, Aug 25 '21 at 17:27
I like this solution but get error: 'numpy.lib.stride_tricks' has no attribute 'sliding_window_view'. Possibly need to update my Numpy version. — user109387, Aug 25 '21 at 20:16
You need `numpy 1.20.0` or newer. You can do the same with [`as_strided`](https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.as_strided.html) but it's failure prone and hard to understand, even the numpy documentation advises to avoid it. — Michael Szczesny, Aug 25 '21 at 20:20

score 1 · Answer 3 · answered Aug 25 '21 at 15:49

1

Just another example where the for loop will probably be faster than a linearized option when n start to be bigger :

def pal_loop(arr,n):
    r = []                          
    l = -(-n//2)                    #ceiling division
    for i in range(len(arr) - n):   #loop over the array
        for j in range(l):          #compare element pair by pair
            if arr[i+j] != arr[i+n-j]:
                break               #break if one pair is not similar
        r.append(i)
    return r

The benchmarking will show almost no difference if n is small or if n is big.

answered Aug 25 '21 at 15:49

obchardon

10,614
1
17
33

Excellent explanation - I need to be less loop-adverse at times! – user109387 Aug 25 '21 at 17:21
for pal_loop(a, 3), I get [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] instead of the correct [4, 6, 9] – user109387 Aug 25 '21 at 20:10

Finding palindromic sequences of specified length among consecutive array elements - is a non-loopy solution possible?

3 Answers3

Micro-Benchmark