9

I'm trying to find the cause of a crash in of our python scripts.

The main structure is this:

def main()
    try:
      dostuff
    except Exception as ex:
      import traceback
      tb = traceback.format_exc()
      import platform
      node = platform.node()
      sendMail([DEBUG_EMAIL], "Alarm exception on %s" % node, str(tb), [])

I get this stacktrace in our main error handling, not in an error email which I'm supposed to.

Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 799, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 66: ordinal not in range(128)

From what I see all of the write-calls to logger are inside the try-block, but since it's not caught and handled in my email sending exception block it seems I've missed something. I've checked and the sendMail function doesn't use the logging module at all. So the exception shouldn't originate in my except-block.

I tried adding

sys.tracebacklimit = 10

at the top of the file see where the exception originates but that didn't affect anything. And now I'm out of ideas on how to find where the problem originates.

The script runs once every hour and only crashes about once per week, which makes me assume that it's something related to the input data, but that's only handled by dostuff().

UPDATE:

I've figured out why I only get one row of the stacktrace. Inside the emit() I found this.

        try:
            ... doing stuff, something goes boom with encoding...
        except UnicodeError:
            stream.write(fs % msg.encode("UTF-8")) Here it goes Boom again
        self.flush()
    except (KeyboardInterrupt, SystemExit):
        raise
    except:
        self.handleError(record) Which means it ends up here

And the relevant part of the handleError function looks like this:

 ei = sys.exc_info()
 try:
     traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr)

Which only prints the last part of the stacktrace.

dutt
  • 7,909
  • 11
  • 52
  • 85
  • I'm not sure what you mean. – dutt Aug 20 '13 at 13:38
  • The exception seems to occur in the `__init__.py` script of the logging package, so it could also happen when the script loads, and fail to import `logging`, before you enter the try block – val Aug 20 '13 at 13:38
  • @ValentinCLEMENT Added information about usage and crash frequency. – dutt Aug 20 '13 at 13:39
  • @ValentinCLEMENT That line in __init__.py is part of the code that writes the record, so in my mind it should originate in a write-call to a logger. – dutt Aug 20 '13 at 13:43
  • 1
    have you looked at this : http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20 and this : http://stackoverflow.com/questions/5141559/unicodeencodeerror-ascii-codec-cant-encode-character-u-xef-in-position-0 "Basically, stop using str to convert from unicode to encoded text / bytes." – Marc Aug 20 '13 at 13:47
  • @Marc: Thanks for the tip, but Unicode is working perfectly in the script except for this once per week crash that I can't find the origin of. I don't understand why I don't get a proper stacktrace to my own code. – dutt Aug 20 '13 at 13:49
  • Indeed, that seems to be a writing error. Are you 100% sure that your `sendMail` function does not write anything in your logs ? – val Aug 20 '13 at 13:51
  • @dutt maybe once in a while it is trying to encode a weird character and fails. – Marc Aug 20 '13 at 13:51
  • @ValentinCLEMENT It doesn't import logging at all. – dutt Aug 20 '13 at 13:52
  • @Marc But why isn't it then caught by my except-block? – dutt Aug 20 '13 at 13:52
  • it may fail encoding the mail... sendMail([DEBUG_EMAIL], "Alarm exception on %s" % node, ->**str(tb)**<-, []) – Marc Aug 20 '13 at 13:53
  • @Marc As I wrote sendMail does not use the logging module at all. – dutt Aug 20 '13 at 13:54
  • Woh, ok... this looks weird, I'm no pro in python so I'll leave you here. Sorry and good luck – Marc Aug 20 '13 at 13:57
  • Any std redirection? Like stderr into the logs? – Robert Caspary Aug 20 '13 at 14:11
  • @ValentinCLEMENT Then I get an e-mail. – dutt Aug 20 '13 at 14:25

1 Answers1

3

Basically your problem is twofold

  1. One log stream does not accept 8-bit strings with extended characters, and throws UnicodeError
  2. There is a silly bug in logging module which make it lose the original traceback

The exact cause of the exception is this:

>>> 'ä'.encode('UTF-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

But this exception is not the real problem. This is the part of the 2.6 logging code; the 799 is the last line of the block. The last line is the one causing the problems. Basically something logs a message in a 8-bit byte string, UTF-8 encoded, containing Latin-1 extended letters; but the stream does not like that, and throws an UnicodeError within the try block;

try:
    if (isinstance(msg, unicode) and
        getattr(stream, 'encoding', None)):

        # .... the string is NOT an unicode instance, so ignored
        # for brevity
    else:
        # this line throws the original exception 
        # fs is a bytestring "%s\n", and msg is a bytestring
        # with extended letters, most probably Latin 1.
        # stream.write spits out an UnicodeError on these values
        stream.write(fs % msg)
except UnicodeError:
    # now we get a useless exception report from this code
    stream.write(fs % msg.encode("UTF-8"))

So to debug this you'd want to set up a breakpoint on the aforementioned line 799, and try all loggers if they accept the following string:

logging.getLogger(name).critical('Testing logger: ä')

If you hit line 799 then get the backtrace of the exception, it can shed some light on what is happening...

  • Good answer :) Except this only happens once per week or so, I'd really like to improve the stacktrace to see where in my code the problem originates. My current plan is to copy and modify the logging module to fix the stacktrace printout. Then when I know where the error comes from I can go back to using the standard module. – dutt Aug 20 '13 at 14:30