7

I have a problem when I'm printing (or writing to a file) the non-ASCII characters in Python. I've resolved it by overriding the str method in my own objects, and making "x.encode('utf-8')" inside it, where x is a property inside the object.

But, if I receive a third-party object, and I make "str(object)", and this object has a non-ASCII character inside, it will fail.

So the question is: is there any way to tell the str method that the object has an UTF-8 codification, generically? I'm working with Python 2.5.4.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • What does "receive a a third-party object" mean? What third-party object? And why can't this mysterious object be trusted to produce proper string values? – S.Lott Nov 10 '09 at 11:08
  • I'm interacting with other programs which are not made by me. Those programs can have objects with string properties which can contain non-ascii characters –  Nov 10 '09 at 11:27

5 Answers5

10

There is no way to make str() work with Unicode in Python < 3.0.

Use repr(obj) instead of str(obj). repr() will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.

Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:

fileObj = codecs.open( "someFile", "w", "utf-8" )

Now you can write unicode strings to fileObj and they will be converted as needed. To make the same happen with print, you need to wrap sys.stdout:

import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • But I have the same problem when I use print(object), because internally it calls to str, so if the object has a non-ascii character it will fail. I've seen that I can put this in the first line of my files.py: # -*- coding: utf-8 -*- but it doesn't work –  Nov 10 '09 at 11:20
  • The encoding of the source file has nothing to do with what `str()` supports. `str()` only supports unicode characters in py3k, so either use repr() or unicode() everywhere. – Aaron Digulla Nov 10 '09 at 11:27
4
none_ascii = '''
        ███╗   ███╗ ██████╗ ██╗   ██╗██╗███████╗███████╗ 
        ████╗ ████║██╔═══██╗██║   ██║██║██╔════╝██╔════╝ 
        ██╔████╔██║██║   ██║██║   ██║██║█████╗  ███████╗ 
        ██║╚██╔╝██║██║   ██║╚██╗ ██╔╝██║██╔══╝  ╚════██║ 
        ██║ ╚═╝ ██║╚██████╔╝ ╚████╔╝ ██║███████╗███████║ 
        ╚═╝     ╚═╝ ╚═════╝   ╚═══╝  ╚═╝╚══════╝╚══════╝ 
'''

print(none_ascii.decode('utf-8'))
Jeeva
  • 1,029
  • 3
  • 15
  • 21
3

How about you use unicode(object) and define __unicode__ method on your classes?

Then you know its unicode and you can encode it anyway you want into to a file.

Kugel
  • 19,354
  • 16
  • 71
  • 103
  • But then I'm in the same problem: if I receive a third party object and I use "unicode(object)", and the object has a non-ascii character, it will fail, won't it? –  Nov 10 '09 at 10:58
  • Besides, when I use "print(object)", internally it calls str method, so I can't use unicode –  Nov 10 '09 at 11:01
  • One more question: if I use python 3, Won't I have those problems? Python3 makes the conversion alone? Does it accept non-ascii characters by default? –  Nov 10 '09 at 11:24
  • All Python 3 strings are (what used to be) unicode by default. – mavnn Nov 10 '09 at 12:22
  • First, please realize, if you receive and array of bytes, witch python strings essetialy are, there is no way to be sure what encoding it is in. If there are third-party objects that give you strings in non-standard encoding, they should also provide which encoding it is in. – Kugel Nov 10 '09 at 18:42
2

I would like to say that I've found a solution in Unix systems, exporting a environment var, with this:

export LC_CTYPE="es:ES.UTF-8"

This way, all files are in utf-8, so I can make prints or whatever and it works fine

0

just paste these two lines at the top of your code

  1. #!/usr/local/bin/python
  2. # coding: latin-1

go to this link for further details https://www.python.org/dev/peps/pep-0263/