0

In the case of this simple regex

pattern = re.compile(r"(.*)*A")

which is immediately compiled, the search time grows exponentially with string length. Is this an expected behavior of python's re library? I would expect that applying a compiled regex on a string should result in linear performance in string length.

pattern.search("x"*23)

1 second.

pattern.search("x"*24)

2 seconds.

pattern.search("x"*25)

4 seconds.

etc.

I tried both py3.7 and 3.8 and even 2.7, same results. However, I couldn't see this problem in applying the same regex from command line in sed.

Peter Franek
  • 577
  • 3
  • 8
  • 25
  • 1
    @superbrain One might expect linear performance because it's possible, and well-known to be possible; any (true) regex can be compiled to a DFA which can be used to recognise the pattern in linear time. – kaya3 Aug 21 '20 at 17:59
  • @kaya3 So now I'm reading about it, but am still surprised that people use python so much, and nobody cares... I came to this question while debugging a real regex that simply never terminates. – Peter Franek Aug 21 '20 at 18:13
  • Just for reference: it seems that the `regex` library https://pypi.org/project/regex/ is compatible with `re` and does not suffer from this catastrophic backtracking -- at least no in simple examples I tried. – Peter Franek Nov 22 '20 at 21:56

0 Answers0