3

simple test program of an encoding issue:

#!/bin/env python
# -*- coding: utf-8 -*-
print u"Råbjerg"      # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'

here is what i get when i use it from a debian command box, i do not understand why using redirect here broke the thing, as i can see it correctly when using without.

can someone help to understand what i have missed? and what should the right way to print this characters so that they are ok everywhere?

$ python testu.py
Råbjerg

$ python testu.py > A
Traceback (most recent call last):
  File "testu.py", line 3, in <module>
    print u"Råbjerg"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 1: ordinal not in range(128)

using debian Debian GNU/Linux 6.0.7 (squeeze) configured with:

$ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

EDIT: from other similar questions seen later from the pointing done below

#!/bin/env python1
# -*- coding: utf-8 -*-
import sys, locale
s = u"Råbjerg"      # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'
if sys.stdout.encoding is None: # if it is a pipe, seems python2 return None
    s = s.encode(locale.getpreferredencoding())
print s
user1340802
  • 1,157
  • 4
  • 17
  • 36
  • ok, thanks and sorry, it is effectively like the post you pointed, and the explication is interesting over there. Just for any reference using `import locale, sys ; print sys.stdout.encoding, locale.getpreferredencoding()` can help to understand the pipe behaviour tty encoding vs None that can default to `ascii` when redirect. – user1340802 Jul 03 '13 at 12:33

3 Answers3

5

When redirecting the output, sys.stdout is not connected to a terminal and Python cannot determine the output encoding. When not directing the output, Python can detect that sys.stdout is a TTY and will use the codec configured for that TTY when printing unicode.

Set the PYTHONIOENCODING environment variable to tell Python what encoding to use in such cases, or encode explicitly.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
3

Use: print u"Råbjerg".encode('utf-8')

Similar question was asked today : Understanding Python Unicode and Linux terminal

Community
  • 1
  • 1
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
2

I'll suggest you to output it already encoded:

print u"Råbjerg".encode('utf-8')

This will write the correct bytes of the string in utf-8 and you'll be able to see in almost every editor/console which support utf-8

Paulo Bu
  • 29,294
  • 6
  • 74
  • 73