1

In Python, you can use the in keyword for checking whether a string contains a substring. A naive implementation, but fast on average, would be something like

def in(small, large):
    for i in range(len(large)):
        if all(c1 == c2 for c1, c2 in zip(small, large[i:])):
            return True
    return False

The run time of this substring matching algorithm is quadratic in the worst case. However, when I try the adversarial example

In [10]: sub = 'a'*10000 + 'b'

In [11]: full = 'a'*1000000

In [12]: sub in full
Out[12]: False

I get the correct answer near instanteneously.

This suggests that under the hood, Python uses a faster algorithm like KMP to check for the presence of this substring. This strikes me as weird, because it seems that the extra computation time for KMP, even if good for adversarial examples like these, is almost never worth it in practice.

So what's going on with Python in?

Mees de Vries
  • 639
  • 3
  • 24
  • 3
    You do know that the source code is freely available, yes? – paxdiablo May 12 '19 at 23:04
  • You can find the string searching algorithm [here](https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h). – gmds May 12 '19 at 23:11
  • Please support your conjecture through the source code, if possible. Since "worth it in practice" is subjective(based on the input) in nature. –  May 12 '19 at 23:15

0 Answers0