7

I have a regex that might take a long time to execute, despite my best efforts at optimization. I want to be able to interrupt it in the cases where it stalls, and proceed with the rest of the program

Other languages like C# have a Timeout property for the Regex execution and I am wondering why Python 3 seems to not have the same approach.

Internally Python 3 have a sort of maximum time of execution, because after a long time the regex abort and the execution go ahead. Is it true?

I would like to analyze that question on python 3 and to use a platform independent approach (I saw decorator that work only on NIX OSs with Signals...)

Maybe the answer is to manage this problem using a more general approach on how to stop function in Python, like in How to add a timeout to a function in Python or Stopping a function in Python using a timeout.

How can I implement such a timeout?

Xiddoc
  • 3,369
  • 3
  • 11
  • 37
robob
  • 1,739
  • 4
  • 26
  • 44
  • 3
    No because in the solution there is a decorator that is system dependent. Moreover the post is not clear on what version of python regards. Maybe in python 3 there is a different approach. – robob Mar 29 '17 at 06:39
  • @WiktorStribiżew rather, that question is an inferior duplicate of one of the ones OP found. – Karl Knechtel Jan 03 '23 at 11:37
  • Since there clearly is not any such functionality built into the standard library regex, there are three ways to interpret the question: 1) "Why not?" -> not suitable for the site; we aren't mind-readers, and we don't deal in the subjective. 2) "What third-party library can I use instead?" -> explicitly off topic; we don't do such recommendations. 3) "How can I implement it myself?" -> there is nothing special about implementing a timeout simply because the task being timed-out is a regex operation; OP already found a post with a generic solution, or at least an attempt. – Karl Knechtel Jan 04 '23 at 03:03
  • So that makes this question a duplicate, if it's suitable at all. If there are any flaws in existing answers at the now-linked duplicate, they should be corrected over there, perhaps with new, better answers. I also closed the other question OP found, as well as the other candidate duplicate, as duplicates of that one. – Karl Knechtel Jan 04 '23 at 03:04

1 Answers1

3

Regarding why the built-in re module for Python doesn't have the same timeout approach as C#- Tim Peters has commented on this matter in a now-closed issue:

Introducing some kind of optional timeout is too involved to just drop in without significant discussion and design effort first.

My first take: it wouldn't really help, because nobody would use it until after it was too late.

However, there is a public PyPI module called regex which aims to provide complete backwards compatibility with the re module, while offering more complex functionality (such as timeouts). Here is a snippet directly from their documentation that shows how to use it:

The matching methods and functions support timeouts. The timeout (in seconds) applies to the entire operation:

>>> from time import sleep
>>>
>>> def fast_replace(m):
...     return 'X'
...
>>> def slow_replace(m):
...     sleep(0.5)
...     return 'X'
...
>>> regex.sub(r'[a-z]', fast_replace, 'abcde', timeout=2)
'XXXXX'
>>> regex.sub(r'[a-z]', slow_replace, 'abcde', timeout=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python310\lib\site-packages\regex\regex.py", line 278, in sub
    return pat.sub(repl, string, count, pos, endpos, concurrent, timeout)
TimeoutError: regex timed out

The timeout functionality in this module is great, because it is wired directly into the main matching loop (see safe_check_cancel), and is not based on any platform-dependent solution, such as leveraging the signal module.

Xiddoc
  • 3,369
  • 3
  • 11
  • 37