5

I am getting this error:

File "run.py", line 37, in <module>
 print str1
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 24-29: ordinal not in range(256)

When trying to simply print some Japanese text. Actually it seems the string looks like this:

\u5149\u66dc\u65e5\u3067\u30e9\u30c6 \u30d4\u30af\u30b7\u30fc\u4e71\u7372\u884c\u304d\u307e\u3059 \u5e0c\u671b\u8005\u52df\u96c6\u4e2d\u3067\u3059\uff3e\uff3e

Which comes in from a JSON file. How can I print this?

Code:

url = "http://www.blah.com/json"
try:
  result = simplejson.load(urllib2.urlopen(url))
except IOError:
  print "Cannot open URL"
  data = "error"

for msg in result["msg"]:
  str1 = msg["character"] + " : " + msg["message"]
  print str1

repr(str1) is

u'Anys : \u5149\u66dc\u65e5\u3067\u30e9\u30c6 \u30d4\u30af\u30b7\u30fc\u4e71\u7372\u884c\u304d\u307e\u3059 \u5e0c\u671b\u8005\u52df\u96c6\u4e2d\u3067\u3059\uff3e\uff3e'

print(sys.stdout.encoding) is

ISO-8859-1
Zeno
  • 1,769
  • 7
  • 33
  • 61
  • What does your code look like? What encoding is your text in? Latin-1, as the name would imply, can't encode Japanese characters. – Wooble Sep 02 '11 at 17:18
  • Please post `repr(str1)`, and `print(sys.stdout.encoding)` – unutbu Sep 02 '11 at 17:19
  • What system is this running on? It's detecting sys.stdout as latin-1 encoded, and you can't write Japanese characters in Latin 1. – Thomas K Sep 02 '11 at 17:19
  • I've added the full code, sorry about that. – Zeno Sep 02 '11 at 17:21
  • unutbu: I edited those into my post – Zeno Sep 02 '11 at 17:25
  • Possible duplicate of [UnicodeEncodeError: 'latin-1' codec can't encode character](http://stackoverflow.com/questions/3942888/unicodeencodeerror-latin-1-codec-cant-encode-character) – ivan_pozdeev Jan 23 '17 at 23:21

1 Answers1

4

The error that you see is because you terminal use latin-1 as encoding, as a side note you can check the encoding of your terminal (assuming that it's your stdout) by doing in your shell:

$ python -c "import sys; print sys.stdout.encoding"

And now for printing in UTF-8 you should encode your string to utf-8 manually like this:

s = u"\u5149\u66dc\u65e5\u3067\u30e9\u30c6 \u30d4\u30af\u30b7\u30fc\u4e71\u7372\u884c\u304d\u307e\u3059 \u5e0c\u671b\u8005\u52df\u96c6\u4e2d\u3067\u3059\uff3e\uff3e"
print s.encode('utf-8')
#Output: 光曜日でラテ ピクシー乱獲行きます 希望者募集中です^^
mouad
  • 67,571
  • 18
  • 114
  • 106
  • Thanks, that did it. For some reason I was trying a different .encode() line that someone said to do elsewhere and it was generating another error. – Zeno Sep 02 '11 at 17:27
  • I do not believe this is the correct answer. You should not manually encode to UTF-8. You should set your output stream encoding to that. – tchrist Sep 02 '11 at 17:50
  • The above (`print`ing bytes) will work on Unix if your terminal character encoding is UTF-8. But as tchrist points out, it is not a general solution. On Windows, the console is character based, not byte based. Alternatively try `sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)` then simply `print s`. – wberry Sep 02 '11 at 19:08
  • @wberry, @tchrist : AFAIK window console only display 256 characters (cp437) and setting the stdout encoding to `UTF8` is not recommended (see: http://wiki.python.org/moin/PrintFails). AFAIK there is no general solution for the print encoding failures in python 2 i think the solution will vary depending on what you're doing, if you want to log use the `logging` module instead of print, if you want just to debug your program i will go with my solution (encode with the right encoding when needed), think can also start to be ugly when using subprocess or threads in your program ... – mouad Sep 02 '11 at 23:53
  • There's no general solution for the Windows console even in Python 3. Limitations in the Windows implementation of the C stdio library and the console itself prevent reliable Unicode output. This affects many other languages and tools too. – bobince Sep 03 '11 at 09:05