7

I wrote a crawler to fetch information out of an Q&A website. Since not all the fields are presented in a page all the time, I used multiple try-excepts to handle the situation.

def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
    while True:
        URL = questionLinkQueue.get()
        try:
            response   = loginSession.get(URL,timeout = MAX_WAIT_TIME)
            raw_data   = response.text

            #These fields must exist, or something went wrong...
            questionId = re.findall(REGEX,raw_data)[0]
            answerId   = re.findall(REGEX,raw_data)[0]
            title      = re.findall(REGEX,raw_data)[0]

        except requests.exceptions.Timeout ,IndexError:
            print >> sys.stderr, URL + " extraction error..."
            questionLinkQueue.task_done()
            continue

        try:
            questionInfo = re.findall(REGEX,raw_data)[0]
        except IndexError:
            questionInfo = ""

        try:
            answerContent = re.findall(REGEX,raw_data)[0]
        except IndexError:
            answerContent = ""

        result = {
                  'questionId'   : questionId,
                  'answerId'     : answerId,
                  'title'        : title,
                  'questionInfo' : questionInfo,
                  'answerContent': answerContent
                  }

        answerContentList.append(result)
        questionLinkQueue.task_done()

And this code, sometimes, may or may not, gives the following exception during runtime:

UnboundLocalError: local variable 'IndexError' referenced before assignment

The line number indicates the error occurs at the second except IndexError:

Thanks everyone for your suggestions, Would love to give the marks that you deserve, too bad I can only mark one as the correct answer...

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Paul Liang
  • 758
  • 8
  • 16
  • Typos, I hand typed it to striped some un-needed lines.. Edited already.. – Paul Liang Feb 21 '14 at 06:43
  • 1
    Related: [multiple exceptions in one line (except block)](http://stackoverflow.com/questions/6470428/catch-multiple-exceptions-in-one-line-except-block?rq=1) – thefourtheye Feb 21 '14 at 06:50
  • The specific issue here is 2.x-specific, since in 3.x [the `as` keyword must be used to capture the exception](https://stackoverflow.com/questions/2535760/). – Karl Knechtel Feb 07 '23 at 01:49

3 Answers3

9

In Python 2.x, the line

except requests.exceptions.Timeout, IndexError:

is equivalent to

except requests.exceptions.Timeout as IndexError:

Thus, the exception caught by requests.exceptions.Timeout is assigned to IndexError. A simpler example:

try:
    true
except NameError, IndexError:
    print IndexError
    #name 'true' is not defined

To catch multiple exceptions, put the names in parentheses:

except (requests.exceptions.Timeout, IndexError):

Later, an UnboundLocalError can occur because the assignment to IndexError makes it a local variable (shadowing the builtin name):

>>> 'IndexError' in answerContentExtractor.func_code.co_varnames
True

So, if requests.exceptions.Timeout was not raised, IndexError will not have been (incorrectly) defined when the code attempts except IndexError:.

Again, a simpler example:

def func():
    try:
        func # defined, so the except block doesn't run,
    except NameError, IndexError: # so the local `IndexError` isn't assigned
        pass
    try:
        [][1]
    except IndexError:
        pass
func()
#UnboundLocalError: local variable 'IndexError' referenced before assignment

In 3.x, the problem will occur (after fixing the except syntax, which makes the error more obvious) even if the first exception is caught. This is because the local name IndexError will then be explicitly deld after the first try/except block.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 1
    It solved the problem.. Thank you for explaining all the stuffs behind the scene.. I got a much better sense about the code other than just fixing the error.. Thanks again.. – Paul Liang Feb 21 '14 at 07:10
2

When you say

except requests.exceptions.Timeout ,IndexError:

Python will except requests.exceptions.Timeout error and the error object will be IndexError. It should have been something like this

except (requests.exceptions.Timeout ,IndexError) as e:
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1
except requests.exceptions.Timeout ,IndexError:

means same as except requests.exceptions.Timeout as IndexError

You should use

except (requests.exceptions.Timeout, IndexError):

instead

Kimvais
  • 38,306
  • 16
  • 108
  • 142