9

I was given to understand that calling print obj would call obj.__str__() which would in turn return a string to print to the console. Now I head a problem with Unicode where I could not print any non-ascii characters. I got the typical "ascii out of range" stuff.

While experimenting the following worked:

print obj.__str__()
print obj.__repr__()

With both functions doing exactly the same (__str__() just returns self.__repr__()). What did not work:

print obj

The problem occured only with using a character out of ascii range. The final solution was to to the following in __str__():

return self.__repr__().encode(sys.stdout.encoding)

Now it works for all parts. My question now is: Where is the difference? Why does it work now? I get if nothing worked, why this works now. But why does only the top part work, not the bottom.

OS is Windows 7 x64 with a default Windows command prompt. Also the encoding is reported to be cp850. This is more of a general question to understand python. My problem is already solved, but I am not 100% happy, mostly because now calling str(obj) will yield a string that is not encoded in the way I wanted it.

# -*- coding: utf-8 -*- 
class Sample(object):

    def __init__(self):
        self.name = u"üé"

    def __repr__(self):
        return self.name

    def __str__(self):
        return self.name

obj = Sample()
print obj.__str__(), obj.__repr__(), obj

Remove the last obj and it works. Keep it and it crashes with

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
javex
  • 7,198
  • 7
  • 41
  • 60
  • What Python version are you running? – Simeon Visser Jul 03 '12 at 19:37
  • Show a minimal example of obj's class with samples of the strings you print. – Mark Tolonen Jul 03 '12 at 20:05
  • 2
    Were you maybe looking for `obj.__unicode__()`? – D K Jul 03 '12 at 21:02
  • Which version of Python are you using? – Rodrigue Jul 03 '12 at 21:38
  • Python version is 2.7.2, I added a sample class, and no, I am not looking for unicode, because as far as I am informed that gets called, when you call `unicode(obj)` print calls `str(obj)` (or it should). Also testing with output: `__unicode__()` gets never called unless you call either `unicode(obj)` or `obj.__unicode__()` – javex Jul 03 '12 at 21:40
  • 1
    Aside: you probably shouldn't call `__functions__` from outside an object. Use `str(obj)` and `repr(obj)` instead. – millimoose Jul 03 '12 at 22:11
  • @millimoose thanks, I will take that into consideration. These were just for testing purposes, but as stated in your answer: My problem was caused because I did call a magic function directly. Thanks! – javex Jul 04 '12 at 21:01

2 Answers2

4

My guess is that print does something like the following for an object obj it's meant to print:

  1. Checks if the obj is a unicode. If so, encodes it to sys.stdout.encoding and prints.
  2. Checks if the obj is a str. If so, prints it directly.
  3. If obj is anything else, calls str(obj) and prints that.

Step 1. is why print obj.__str__() works in your case.

Now, what str(obj) does is:

  1. Call obj.__str__().
  2. If the result is a str, return it
  3. If the result is a unicode, encodes it to "ascii" and return that
  4. Otherwise, something mostly useless.

Calling obj.__str__() directly skips steps 2-3, which is why you don't get the encoding failure.

The problem isn't caused by how print works, it's caused by how str() works. str() ignores sys.stdout.encoding. Since it doesn't know what you want to do with the resulting string, the default encoding it uses can be considered arbitrary; ascii is as good or bad a choice as any.

To prevent this bug, make sure you return a str from __str__() as the documentation tells you to do. A pattern you could use for Python 2.x might be:

class Foo():
    def __unicode__(self):
        return u'whatever'
    def __str__(self):
        return unicode(self).encode(sys.stdout.encoding)

(If you're sure you don't need the str() representation for anything but printing to the console.)

millimoose
  • 39,073
  • 9
  • 82
  • 134
  • Thank you that is the perfect explanation I was looking for. This surely explains my problem. Now what if I *do* want to have more than just console output. What would be a good solution? My approach was to define a second parameter like this: `__str__(self, encoding=sys.stdout.encoding)`. Does this seem like a good idea? – javex Jul 04 '12 at 20:59
  • 1
    @user1461135 There isn't really a situation where you would pass extra parameters into `__str__()`, seeing as you're not meant to call it directly. I'd just use `unicode(obj).encode('yadda')` wherever you'd want to call `obj.__str__(encoding='yadda')`, it's less likely to surprise people. – millimoose Jul 04 '12 at 22:00
1

First, if you look at the online documentation, __str__ and __repr__ have different purposes and should create different outputs. So calling __repr__ from __str__ is not the best solution.

Second, print will call __str__ and will not expect to receive non-ascii characters, because, well, print cannot guess how to convert the non-ascii character.

Finally, in recent versions of Python 2.x, __unicode__ is the preferred method of creating a string representation for an object. There is an interesting explanation in Python str versus unicode.

So, to try and really answer the question, you could do something like:

class Sample(object):

    def __init__(self):
        self.name = u"\xfc\xe9"

    # No need to implement __repr__. Let Python create the object repr for you

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __unicode__(self):
        return self.name
Community
  • 1
  • 1
Rodrigue
  • 3,617
  • 2
  • 37
  • 49