0

I have a function that tries a list of regexes on some text to see if there's a match.

@timeout(1)
def get_description(data, old):
    description = None
    if old:
        for rx in rxs:
            try:

                matched = re.search(rx, data, re.S|re.M)
                if matched is not None:
                    try:
                        description = matched.groups(1)
                        if description:
                            return description
                        else:
                            continue
                    except TimeoutError as why:
                        print(why)
                        continue
                else:
                    continue
            except Exception as why:
                print(why)
                pass

I use this function in a loop and run a bunch of text files through. In one file, execution keeps stopping:

Traceback (most recent call last):
  File "extract.py", line 223, in <module>
    scrape()
  File "extract.py", line 40, in scrape
    metadata = get_metadata(f)
  File "extract.py", line 186, in get_metadata
    description = get_description(text, True)
  File "extract.py", line 64, in get_description
    matched = re.search(rx, data, re.S|re.M)
  File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\re.py", line 182, in search
    return _compile(pattern, flags).search(string)
KeyboardInterrupt

It simply hangs on evaluating matched = re.search(rx, data, re.S|re.M). For many other files, when no match is found, it goes on to the next regex. With this file, it does nothing and throws no exception. Any ideas what could be causing this?

EDIT: I'm now trying to detect timeout errors (This is more efficient for me than changing the rx's)

The TimeoutError, borrowed from this question, is triggered but doesn't cause the script to keep running. It simply writes 'Timer expired' and stays frozen.

David J.
  • 1,753
  • 13
  • 47
  • 96

0 Answers0