0

I use the regular matching function re.match(pattern, str) in (Python3.10, windows10), but when the regex pattern is wrong, sometimes a Catastrophic Backtracking occurs. As a result, the program stucks at re.match and cannot continue.

Since I have a lot of regular expressions, I can't change them one by one.

I've tried to limit function execution time, but because I'm a windows platform, all of them don't work.

  • signal (only work in Unix)
  • func_timeout
  • timeout-decorator
  • evenlet

My test function as follow, I have tried the answer in How to limit execution time of a function call?, but doesn't work:

class TimeoutException(Exception):
    def __init__(self, msg=''):
        self.msg = msg


@contextmanager
def time_limit(seconds, msg=''):
    timer = threading.Timer(seconds, lambda: _thread.interrupt_main())
    timer.start()
    try:
        yield
    except KeyboardInterrupt:
        raise TimeoutException("Timed out for operation {}".format(msg))
    finally:
        # if the action ends in specified time, timer is canceled
        timer.cancel()

def my_func():
    astr = "http://www.fapiao.com/dzfp-web/pdf/download?request=6e7JGm38jfjghVrv4ILd-kEn64HcUX4qL4a4qJ4-CHLmqVnenXC692m74H5oxkjgdsYazxcUmfcOH2fAfY1Vw__%5EDadIfJgiEf"
    pattern = "^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]:)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~\/])+$"
    reg = re.compile(pattern)
    result = reg.match(astr)
    return result

if __name__ == '__main__':
    try:
        my_func()
    except TimeoutException as e:
        print(e.msg)

So is there any way to:

  • stop re.match when "Catastrophic Backtracking" occurs
  • limit the number/time of regular matching or raise Exception when too much match time
  • or limit the execution time of a function
luckin
  • 27
  • 3
  • similar question: https://stackoverflow.com/questions/47876259/python-regex-catastrophic-backtracking-in-url-handling. but timeout-decorator not solve my problem – luckin Jul 06 '23 at 09:07
  • Have you tried running your regex match in a thread and putting a timeout on it? such as https://stackoverflow.com/questions/35548468/how-to-set-timeout-to-threads – Learning is a mess Jul 06 '23 at 09:07
  • yes, I have tried running in a thread, but still doesn't work. But I don't know if I'm using thread correctly, – luckin Jul 06 '23 at 09:26
  • please share your code – Learning is a mess Jul 06 '23 at 09:31
  • Srroy, I'm new to stackoverflow. I have updated my code in the question. please help to check it. thanks! – luckin Jul 07 '23 at 01:30
  • Update:The top answer below this question https://stackoverflow.com/questions/28507359/limit-function-execution-in-python solve my problem. – luckin Jul 07 '23 at 07:06

1 Answers1

0

I know that I can start a child process and terminate it if it hasn't completed within a certain amount of time. The result from the "worker" function, my_func, must now be passed via a managed queue instance:

def my_func(result_queue):
    import re

    astr = "http://www.fapiao.com/dzfp-web/pdf/download?request=6e7JGm38jfjghVrv4ILd-kEn64HcUX4qL4a4qJ4-CHLmqVnenXC692m74H5oxkjgdsYazxcUmfcOH2fAfY1Vw__%5EDadIfJgiEf"
    pattern = "^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]:)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~\/])+$"
    reg = re.compile(pattern)
    result = reg.match(astr)
    # Cannot pickle a match object, so we must send back this:
    result_queue.put(
        {
            'span': result.span(),
            'group0': result[0],
            'groups': result.groups()
        }
    )

if __name__ == '__main__':
    from multiprocessing import Process, Manager

    with Manager() as manager:
        result_queue = manager.Queue()
        p = Process(target=my_func, args=(result_queue,))
        p.start()
        p.join(1) # Allow up to 1 second for process to complete
        if p.exitcode is None:
            # The process has not completed. So kill the process:
            print('killing process')
            p.terminate()
        else:
            # The process has completed. So get the result:
            result = result_queue.get()
            print(result)
            p.join() # This should return immediately since the process has completed.
Booboo
  • 38,656
  • 3
  • 37
  • 60