3

I thought, I understood unicode and python. But this issue confuses me a lot. Look at this small test program:

# -*- coding: utf-8 -*-

class TestC(object):

    def __str__(self):
        return u'äöü'

import sys
print sys.version
print sys.stdin.encoding
print sys.stdout.encoding    
print u'öäü' #this works
x = TestC()
print x #this doesn't always work

When I run this from my bash terminal on ubuntu, I get the following result:

2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3]
utf-8
utf-8
öäü
Traceback (most recent call last):
  File "test_mod.py", line 14, in <module>
    print x #this doesn't '
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

However, when I run the same thing from within eclipse (using the pydev module), both print statements work flawlessly. The console windows says:

2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3]
utf-8
utf-8
öäü
äöü

Can someone please explain to me what the issue is? Why does the __str__ method work in one case but not in the other? What is the best way to fix this?

  • 2
    Why would you want to return unicode from a `__str__` function? – Daniel Roseman Oct 10 '12 at 17:57
  • As Edward has pointed out, the right place to return my unicode string is probably the \__unicode\__ method. But I am still wondering: Why does the _wrong_ code work in my eclipse environment (that seems to be identical) ? – Pappenheimer Oct 11 '12 at 10:57

1 Answers1

7

See this related question: Python __str__ versus __unicode__

Basically, you should probably be implementing the special method __unicode__ rather than __str__, and add a stub __str__ that calls __unicode__:

def __str__(self):
    return unicode(self).encode('utf-8')
Community
  • 1
  • 1
Edward Loper
  • 15,374
  • 7
  • 43
  • 52