-3

I'm getting a unicode error only when overriding my class' __str__ method. What's going on?

In Test.py:

class Obj(object):

    def __init__(self):
        self.title = u'\u2018'

    def __str__(self):
        return self.title


print "1: ", Obj().title
print "2: ", str(Obj())

Running this I get:

$ python Test.py
1:  ‘
2: 
Traceback (most recent call last):
  File "Test.py", line 11, in <module>
    print "2: ", str(Obj())
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 0: ordinal not in range(128)

EDIT: Please don't just say that str(u'\u2018') also raises an Error! (while that may be related). This circumvents the entire purpose of built-in method overloading --- at no point should this code call str(u'\u2018')!!

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • 1
    `str(Obj().title)` has the same behaviour, it's not related to `__str__` – Hacketo Sep 11 '15 at 22:14
  • possible duplicate of [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) – Hacketo Sep 11 '15 at 22:17
  • 4
    afaik `__str__` is contractually obligated to return ascii bytes and not unicode, not doing that may lead to issues ... try `def __str__(self):return self.title.encode("utf8")` – Joran Beasley Sep 11 '15 at 22:17
  • 1
    @DilithiumMatrix the fact is the error come from the unicode in `title` and not because of the overloading of `__str__` , http://stackoverflow.com/help/mcve would have highlight this, the question is a duplicate – Hacketo Sep 11 '15 at 22:25
  • 1
    `self.title` is a unicode object as seen in your constructor. You must override `__unicode__` – Malik Brahimi Sep 11 '15 at 22:29
  • @Hecketo, the question is not about the error! It's about **why the error only happens in the overloading method**! – DilithiumMatrix Sep 11 '15 at 22:30
  • @MalikBrahimi can you expand on that? Why? – DilithiumMatrix Sep 11 '15 at 22:31
  • @JoranBeasley I don't completely follow. So, you're saying that `print` is explicitly expecting `ascii` in the (2) case? But not in (1)? – DilithiumMatrix Sep 11 '15 at 22:33
  • 4
    @DilithiumMatrix It doesn't happen *in* the overloading method. `str(Obj())` will call `str(Obj().__str__())`, which becomes `str(u'\u2018')` which throws `UnicodeEncodeError`. I don't understand why you're so hostile to people who are giving you the answer. – Adam Sep 11 '15 at 22:34
  • @Adam, so `str()` works completely differently from `len()` then? i.e. `len()` *definitely does not* call `len( Obj().__len__() )` ... where is this documented? I'm sorry if my response has been overly hostile --- but I think the 'close' vote, 'duplicate', and given answers are all misunderstanding the problem.... – DilithiumMatrix Sep 11 '15 at 22:44
  • 2
    You're hostile because you have a poor understanding of how Python works yet are shouting and downvoting people who are helping you. Good luck. – Adam Sep 11 '15 at 22:52
  • @DilithiumMatrix [`docs`](https://docs.python.org/2/reference/datamodel.html#object.__str__) state : `Called by the str()` ; not saying that `str(Obj())` is a short way to call `Obj().__str__()` – Hacketo Sep 11 '15 at 23:19

2 Answers2

3

You're using Python 2.x. str() calls __str__ and expects you to return a string—that is, a str. But you're not; you're returning a unicode object. So str() helpfully tries to convert that to a str since it's what str() is supposed to return.

Now, in Python 2.x strings are sequences of bytes, not codepoints, so Python is trying to convert your Unicode object to a sequence of bytes. Since you didn't (and can't, in this scenario) specify what encoding to use when making the string, Python uses the default encoding of ASCII. This fails because ASCII can't represent the character.

Possible solutions:

  1. Use Python 3, where all strings are Unicode. This will provide you with an entertainingly different set of things to wrap your head around, but this won't be one of them.

  2. Override __unicode__() instead of __str__() and use unicode() instead of str() when converting your object to a string. You still have the problem (shared with Python 3) of how to get that converted into a sequence of bytes that will output correctly.

  3. Figure out what encoding your terminal is using (i.e. sys.stdout.encoding) and have __str__() convert the Unicode object to that encoding before returning it. Note that there's still no guarantee that the character is representable in that encoding; you can't convert your example string to the default Windows terminal encoding, for example. In this case you could fall back to e.g. unicode-escape encoding if you get an exception trying to convert to the output encoding.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • Thanks! This is extremely clear. It seems strange that `str()`, in addition to calling `__str__()` also tries to make the conversion in this way. For example, if `def __str__(self): return 5`, then this gives the error `TypeError: __str__ returned non-string (type int)` --- i.e. it does not just return `str(5)` (which works fine); nor, in my case, does it give the error `returned non-string (type unicode)` or something. – DilithiumMatrix Sep 11 '15 at 23:07
  • Yeah, I think when they added `unicode` they were trying to be helpful, but there are inconsistencies like that. – kindall Sep 11 '15 at 23:08
0

Problem is that str() cannot handle u'\u2018' (unicode), since it tries to convert it to ascii and there is no ascii character for it.

>>> str(u'\u2018')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 0: ordinal not in range(128)
>>> 

You can look at this for more info...

Community
  • 1
  • 1
  • @DilithiumMatrix on this line `print "2: ", str(Obj())`. – Adam Sep 11 '15 at 22:32
  • @Adam why isn't that line calling the overriding method?? – DilithiumMatrix Sep 11 '15 at 22:34
  • 2
    Well, it is, but the overiding method returns self.title and that is set to str(u'\u2018') in the init override.... – Sebastiaan Mannem Sep 11 '15 at 22:35
  • @SebastiaanMannem no, it's set to `unicode` in `__init__` – DilithiumMatrix Sep 11 '15 at 22:36
  • @Adam so you're saying the code is doing: `str( Obj().__str__() )`? Why? – DilithiumMatrix Sep 11 '15 at 22:38
  • Because that's how Python converts arbitrary objects to printable strings. It calls their `__str__`. That is literally the only purpose of that function. How did you think it worked? – Adam Sep 11 '15 at 22:40
  • 2
    Your problem is that you return a `unicode` object where you are obliged to return a ascii `str`. If you don't like it, use Python 3. – Adam Sep 11 '15 at 22:41
  • @Adam, if that were true, then the first example would also throw an error. – DilithiumMatrix Sep 11 '15 at 22:41
  • 2
    @DilithiumMatrix False. The print function can print unicode objects directly. – Adam Sep 11 '15 at 22:42
  • @DilithiumMatrix: You are absolutely right. It is set to a Unicode string (being u'\u2018'). And when str(Obj()) is executed, the function Obj.__str__() returns that unicode value to be inserted in the str() function. the str() function receives the Unicode string and that throws the error. – Sebastiaan Mannem Sep 11 '15 at 22:43
  • @Adam, here `__str__()` return a unicode object. You're saying that `print` can handle that directly. But you're also saying that `print` calls `str()` on everything to handle it. – DilithiumMatrix Sep 11 '15 at 22:46
  • 1
    In your first print you are passing a unicode object to the print function. The print function doesn't need to do any conversion to output that object. In the second print line **YOU ARE CALLING `str`**. – Adam Sep 11 '15 at 22:49
  • 1
    Yes, because that never creates a `str`! – Adam Sep 11 '15 at 22:51