I have a function that tries a list of regexes on some text to see if there's a match.
@timeout(1)
def get_description(data, old):
description = None
if old:
for rx in rxs:
try:
matched = re.search(rx, data, re.S|re.M)
if matched is not None:
try:
description = matched.groups(1)
if description:
return description
else:
continue
except TimeoutError as why:
print(why)
continue
else:
continue
except Exception as why:
print(why)
pass
I use this function in a loop and run a bunch of text files through. In one file, execution keeps stopping:
Traceback (most recent call last):
File "extract.py", line 223, in <module>
scrape()
File "extract.py", line 40, in scrape
metadata = get_metadata(f)
File "extract.py", line 186, in get_metadata
description = get_description(text, True)
File "extract.py", line 64, in get_description
matched = re.search(rx, data, re.S|re.M)
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\re.py", line 182, in search
return _compile(pattern, flags).search(string)
KeyboardInterrupt
It simply hangs on evaluating matched = re.search(rx, data, re.S|re.M)
. For many other files, when no match is found, it goes on to the next regex. With this file, it does nothing and throws no exception. Any ideas what could be causing this?
EDIT: I'm now trying to detect timeout errors (This is more efficient for me than changing the rx's)
The TimeoutError, borrowed from this question, is triggered but doesn't cause the script to keep running. It simply writes 'Timer expired' and stays frozen.