Within standard Python, the bottlenecks here will be count
, which enjoys C speed implementation and the looping.
The looping itself may be hard to speed-up (althogh Cython may be of some help).
Hence, the most important optimization is to reduce the number of loopings.
One obvious way is to let range()
do not exceed half the size of the input (+ 2
: + 1
for rounding issues, + 1
for end extrema exclusion in range()
):
Also, string
is a standard Python module, so better not use it as a variable name.
def has_subpattern_loop(text):
for i in range(1, len(text) // 2 + 2):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A much more effective way of restricting the number of calls to count
is to skip computation when i
is not a multiple of the length of the input.
def has_subpattern_loop2(text):
for i in range(1, len(text) // 2 + 2):
if len(text) % i == 0:
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
Even better would be to generate only the divisors of the length of the input.
This could be done using sympy
and the approach outlined here:
import sympy as sym
import functools
def get_divisors(n):
if n == 1:
yield 1
return
factors = list(sym.factor_.factorint(n).items())
nfactors = len(factors)
f = [0] * nfactors
while True:
yield functools.reduce(lambda x, y: x * y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
if i >= nfactors:
return
def has_subpattern_divs(text):
for i in get_divisors(len(text)):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A completely different approach is the one proposed in @ВладДавидченко answer:
def has_subpattern_find(text):
return (text * 2).find(text, 1) != len(text)
or the more memory efficient (requires ~50% less additional memory compared to has_subpattern_find2()
):
def has_subpattern_find2(text):
return (text + text[:len(text) // 2 + 2]).find(text, 1) > 0
and it is based on the idea that if there is a exactly self-repeating string, the string itself must be found in a circularly extended string:
Input: abab
Extension1: abababab
Found1: |-abab
Extension2: ababab
Found2: |-abab
Input: ababa
Extension1: ababaababa
Found1: |----ababa
Extension2: ababab
Found2: NOT FOUND!
The find
-based method are the fastest, with has_subpattern_find()
being fastest in the small input size regime, and has_subpattern_find2()
gets generally faster in the intermediate and large input size regime (especially in the False
case).
For shorter inputs, the direct looping approaches (especially has_subpattern_loop2()
) are fastest, closely followed (but sometimes surpassed by has_subpattern_re()
), but as soon as the input gets bigger (and especially for the False
outcome), the has_subpattern_divs()
method gets to be the fastest (aside of find
-based ones) by far and large, as shown by the following benchmarks.
For the True
outcome, has_subpattern_loop2()
gets to be the fastest due to the very small number of loops required, which is independent of the input size.
The input is generated as a function of n
using:
def gen_input(n, m=0):
g = string.ascii_lowercase
if not m:
m = n
offset = '!' if n % 2 else ''
return g[:n] * (m // min(n, len(g)) + 2) + offset
so that if n
is even, the has_subpattern*()
always return True and the opposite for odd n
.
Note that, in general, the has_subpattern()
function will depend not only on the raw size of the input but also on the length of the repeating string, if any. This is not explored in the benchmarks, except for the odd/even separation.


(Full code available here).
(EDITED to include some more solutions as well as comparison with regex-based solution from @RomanPerekhrest)
(EDITED to include some more solutions based on the find from @ВладДавидченко)