2

Suppose I have an exception e, and I would like to format it for logging/printing:

def format_exception(e):
    # type: (Exception) -> Text
    """Log formatter used in a catch-all near the edge."""
    return str(e)  # Python 2.7 only

Specifically, I want to get the exception message - the equivalent of e.message in Python 2.6, or str(e) in Python 2.7.

I have tried

return six_text_type(e)

However, that fails if e.message contains encoded bytes (which, given that I am working in a py2-py3 environment, can happen.)

>>> six.text_type(MyError(u''))   # OK
>>> six.text_type(MyError(u''.encode('utf-8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

traceback.format_exception_only (from related question) does almost the right thing (handles both bytes and unicode), but it forces me to split on :. It also doesn't help that format_exception_only returns a byte string in python2, and a unicode string in python3.

# python2
>>> type(traceback.format_exception_only(type(e), e)[0])
str
# python3
>>> type(traceback.format_exception_only(type(e), e)[0])
str

So that doesn't quite work. Wrapping that in six.text_type again fails if e.message contains encoded bytes.

What's the right way to fill in format_exception? Do I really need to use traceback2?

def format_exception(e):
    # type: (Exception) -> Text
    return traceback2.format_exception_only(type(e), e)[0].split(': ')[1]

I can install and use traceback2, but it feels like there should be a better way.

James Lim
  • 12,915
  • 4
  • 40
  • 65
  • 1
    I just don't bother with using `six.text_type` - if your code base **must** work under both Python 2 AND Python 3 at the same time, the only way I found reliability is to always prefix human text strings with `u''` and things that are to be `bytes` with `b''`. Don't bother with `six.text_type` because it adds complexity that makes it not work mixing across so many API boundaries (logging/printing/error handling/stack traces). – metatoaster Sep 03 '18 at 05:02
  • I found using `u''` and `b''` explicitly to work best as well. However, in this case I am getting `e` from some faraway place, and I just need to log it. – James Lim Sep 03 '18 at 05:04
  • How does that faraway `e` is produced? Will calling `str(e)` under Python3 always work and calling `unicode(e)` under Python 2 also works? If that's the case I just force `str = unicode` under a Python 2 environment and forget about it. If that exception is produced through certain heuristics on the running Python environment, that may get tricky because that's premature optimization on the library's part. – metatoaster Sep 03 '18 at 05:10
  • Could be produced by 3rd party libraries, such as one that validates email addresses which may (as of recently) contain emojis. `unicode(e)` doesn't always work if your exception contains encoded bytes e.g. `unicode(Exception(u''.encode('utf-8')))`. The issue here is that different 3rd parties don't follow the same unicode-vs-bytes convention. – James Lim Sep 03 '18 at 05:15
  • The approach I was going to suggest to deal with the gaggle of types (of both `bytes` and `str`) would have been something similar to what `traceback2` does, basically every non-ascii characters will simply be rendered as `\x` escaped representation, i.e. `[u"Exception: b'\\xe2\\x80\\xa2'\n"]`. Python 2 forces a lot of unhealthy habits because zero distinction between human readable text and raw bytes, resulting in the **internal API** being completely inconsistent with how those types are handled (manifests in stdlib breakages). I can't wait till Python 2 is completely dropped everywhere. – metatoaster Sep 03 '18 at 07:04
  • Basically: there is no better way - the only way to do this is to do exactly what `traceback2` does, because `traceback` is broken in Python 2 (along with `StringIO`, `logging`, and any stdlib that uses `str` and/or `unicode` types wrongly) – metatoaster Sep 03 '18 at 07:05

0 Answers0