This doesn't address my question: How to timeout function in python, timeout less than a second
In the comments, it states the issue I'm having: According to the signal documentation, this won't work: "Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time."
I'm attempting to use regex to parse content from the web using the gevent Python library. It works well, except when I encounter really large content. Is there a way to terminate a thread that doesn't complete in x seconds?
Here's what I've come up with, but it doesn't work:
def get_all_matches(self, content, the_regex, timeout = 5):
try:
def kill_regex(*args, **kwargs):
raise TimeoutError
signal.signal(signal.SIGALRM, kill_regex)
signal.alarm(int(timeout))
return re.findall(the_regex, content, re.IGNORECASE)
except Exception:
return []