8

I'm writing Python 2 code with Unicode strings, importing unicode_literals and I'm having issues with raising exceptions.

# -*- coding: utf-8 -*-

from __future__ import unicode_literals

raise Exception('Tést')

When doing this, the 'Tést' string is stripped off the terminal.

I can workaround this with

raise Exception('Tést'.encode('utf-8'))

I'd rather find a global solution than having to do this in all raise Exception statements.

(Since I'm using PyQt's tr() function in Exception messages, special characters must be handled, I can't know at coding time whether encode('utf-8') is necessary.)

Worse. Sometimes, I want to catch an Exception, get its message, and raise a new Exception, concatenating a base string with the first Exception string.

I have to do it this way:

try:
    raise TypeError('Tést'.encode('utf-8'))
except Exception as e:
    raise Exception('Exception: {}'.format(str(e).decode('utf-8')).encode('utf-8'))

but I really wish it could be less cumbersome (and this example doesn't even include the self.tr() calls).

Is there any simpler way ?

(And as a side question, are things simpler with Python3 ? Can Exception use unicode strings ?)

Jérôme
  • 13,328
  • 7
  • 56
  • 106
  • Disclaimer: I am not familiar with PyQT's tr() and how you use it, so I may not understand the specifics of the issue very well. Just to make sure you looked at related things: -- Did you look at global stderr hooks? (http://stackoverflow.com/questions/8288717/python-print-passing-extra-text-to-sys-stdout, http://www.macfreek.nl/memory/Encoding_of_Python_stdout); -- Did you look at exception hooks? (http://www.creativelydo.com/blog/how-to-globally-customize-exceptions-in-python/) - a lot of freedom in formatting exceptions; -- Did you try using your own exception class? – morfizm Mar 17 '15 at 11:02
  • 4
    You've already answered your own question: Python3 is the answer. If you're starting a new project, you really should not be using Python2 unless you have very good reasons for doing so (e.g. inescapable dependencies). The unicode improvements in Python3 are probably one of it's biggest selling points - in fact, the issues you are facing are *exactly* why those improvements were made. – ekhumoro Mar 17 '15 at 19:44
  • Yes, ekhumoro, I'd rather go Python3. We have to use Python 2 because of a dependency. I'll double-check there is no alternative. In the meantime, as a principle, I'm doing my best to be future-compliant and do things the way they should be. Since you confirm there is no ideal way, I'll choose my workaround. The "own Exception class" is clearly not the canonical way, but it looks like a reasonable workaround. – Jérôme Mar 17 '15 at 20:32
  • Thanks morfizm for the own Exception class suggestion. I edited my question. – Jérôme Mar 17 '15 at 20:33
  • You're going to have trouble if you initialize `MyException` with an already encoded `utf-8` byte string, since it will try to decode it (from ASCII) before it encodes it again. – Mark Ransom Mar 17 '15 at 21:21
  • Yes. I use `from __future__ import unicode_literals`, and the strings returned by Qt's self.tr() are Unicode (maybe because of the `CODECFORTR = UTF-8` parameter in the `.pro` file). But I can add a test in MyException, to check whether the input is byte string of Unicode. – Jérôme Mar 18 '15 at 12:53
  • @Jérôme. The `tr()` function always returns a `QString` with Python2, unless you [use sip to automatically convert to the equivalent python type](http://pyqt.sourceforge.net/Docs/PyQt4/incompatible_apis.html) instead. Either way, the returned object will always represent a unicode string. The settings in the pro file have no influence whatsoever on runtime behaviour. You must use `QTextCodec.setCodecForTr` to ensure that string objects passed to `tr` are decoded to unicode correctly (where necessary). – ekhumoro Mar 18 '15 at 18:26
  • @Jérôme. The way to deal with unicode can be stated very simply: decode in, encode out. Never use byte strings anywhere in your application unless you can be certain they will only ever contain 7-bit ascii characters. Otherwise, all other strings used within your application must be unicode. The only time such strings should be encoded to bytes is when they leave your application (e.g. when printing them to stdout). For exceptions, centralise the handling with [`sys.excepthook`](https://docs.python.org/3/library/sys.html#sys.excepthook) and do all the encoding in one place. – ekhumoro Mar 18 '15 at 19:00
  • 1
    @ekhumoro the problem in this case is that the exception object doesn't properly encode the unicode string contained within itself, either losing the message altogether or replacing non-ASCII characters with an escape sequence. It's a flaw in the trace output processing. I imagine that Python 3 handles this more gracefully, but I haven't tried it. – Mark Ransom Mar 18 '15 at 19:28
  • Thanks @ekhumoro for correcting my side comments on `tr()`. I do use `sip.setapi('QString', 2)`. I try to remove byte strings from my code. I had to modify my csv file reader. Only exceptions still give me trouble. Unfortunately, `Exception`s talk byte string. AFAIU, this is because when writing `except MyException as e` then `format(e)`, `e.__str__()` is called and returns a byte string. Is there a way to let it return Unicode ? Are you suggesting I reimplement __str__() in MyException to return a Unicode, then reimplement `sys.excepthook` so that it can handle Unicode ? – Jérôme Mar 18 '15 at 19:28
  • 1
    @Jérôme. You don't want `__str__` to return unicode: it must return correctly encoded bytes (i.e. `return self.message.encode('utf-8')`). You should probably also define `__unicode__`, which should just return `self.message`. If you didn't do the latter, you could only use `format(e)` with byte strings, which would have to be specified explicitly (since you're using unicode literals). – ekhumoro Mar 18 '15 at 20:54
  • This makes sense. Question updated. – Jérôme Mar 18 '15 at 21:06
  • Related: Python issue 2517 – [Error when printing an exception containing a Unicode string](https://bugs.python.org/issue2517) – Piotr Dobrogost Jan 08 '19 at 22:11
  • Related: [possible to raise exception that includes non-english characters in python 2?](https://stackoverflow.com/q/13256777/95735) – Piotr Dobrogost Jan 08 '19 at 22:26

1 Answers1

12

Thanks to the comments below the question, I came up with this.

The idea is to use a custom Exception subclass.

# -*- coding: utf-8 -*-

from __future__ import unicode_literals

class MyException(Exception):

    def __init__(self, message):

        if isinstance(message, unicode):
            super(MyException, self).__init__(message.encode('utf-8'))
            self.message = message

        elif isinstance(message, str):
            super(MyException, self).__init__(message)
            self.message = message.decode('utf-8')

        # This shouldn't happen...
        else:
            raise TypeError

    def __unicode__(self):

        return self.message

class MySubException(MyException):
    pass

try:
    raise MyException('Tést')
except MyException as e:
    print(e.message)
    raise MySubException('SubException: {}'.format(e))
Jérôme
  • 13,328
  • 7
  • 56
  • 106
  • 2
    elegant solution, just remind that we should add # -*- coding:utf8 -*- at the begin of the source code file, or else will counter unexpected sign when get the error message :) – wllbll Aug 10 '17 at 03:34