3

I have a class chunk with text fields title and text. When I want to print them, I get (surprise, surprise!) UnicodeDecodeError. It gives me an error when I try to format an output string, but when I just concatenate text and title and return it, I get no error:

class Chunk:
  # init, fields, ...

  # this implementation will give me an error
  def __str__( self ):
    return u'{0} {1}'.format ( enc(self.text), enc(self.title) )

  # but this is OK - all is printed without error
  def __str__( self ):
    return enc(self.text) + enc(self.title)

def enc(x):
  return x.encode('utf-8','ignore') # tried many combinations of arguments...


c = Chunk()
c.text, c.title = ... # feed from external file
print c

Bum! Error!

return u'{0} {1}'.format ( enc(self.text), enc(self.title) )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2844: ordinal not in range(128)

I think I used all the possible combinations of encode/decode/utf-8/ascii/replace/ignore/...

(the python unicode issue is really irritating!)

Jakub M.
  • 32,471
  • 48
  • 110
  • 179

2 Answers2

4
  1. You should override __unicode__, not __str__, when you return a unicode.
  2. There is no need to call .encode(), since the input is already a unicode. Just write

    def __unicode__(self):
        return u"{0} {1}".format(self.text, self.title)
    
Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
3

The simplest way to avoid 2.x python's unicode problem is to set overall encoding to utf-8, or such a problems will be constantly arise in a sudden places:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
RaSergiy
  • 165
  • 6
  • 4
    This is a bad idea. Sure, problems will arise, and you need to fix them - you have to know what data (characters) you work with and handle all the edge cases. `setdefaultencoding` will just hide bugs. – Roman Bodnarchuk Nov 06 '12 at 07:46