0

I have the following Python script:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

print('☺')

When I run it on my Debian system, it produces the following output, as expected:

$ ./test.py 
☺
$ 

However, when I change locale to "C", by setting the LANG environment variable, the script throws a UnicodeEncodeError:

$ LANG=C ./test.py
Traceback (most recent call last):
  File "./test.py", line 4, in <module>
    print('\u263a')
UnicodeEncodeError: 'ascii' codec can't encode character '\u263a' in position 0: ordinal not in range(128)
$ 

This problem prevents this script from being executed in minimal environments, such as during boot or in embedded systems. Also, I suspect that many existing Python programs can be broken by executing them with LANG=C. Here's an example on Stackoverflow of a program that presumably broke because it's executed in the "C"-locale.

Is this a bug in Python? What's the best way to prevent this?

Community
  • 1
  • 1
Jaap Joris Vens
  • 3,382
  • 2
  • 26
  • 42
  • 3
    "Is this a bug in Python?" — You're telling Python to print WHITE SMILING FACE in a locale that doesn't have it. I don't think that's Python's fault. – jwodder Aug 29 '16 at 12:49

1 Answers1

3

This is because Python 3 uses the locale settings to deduce the output character encoding; that is, Python will use the locale that would be displayed for LC_CTYPE when you execute the locale command:

% locale 
...
LC_CTYPE="en_US.UTF-8"
...

If you force LC_CTYPE to C, then Python will assume that ASCII should be used as the output encoding. And ASCII doesn't have a mapping for U+263A.

If you want Python to know how to encode Unicode properly, set the LC_CTYPE to an appropriate value, or write binary to fd 1.

  • Does this mean that, in order to make a script portable, I should always test what the current locale setting is before I try to print Unicode strings? – Jaap Joris Vens Aug 29 '16 at 14:05
  • 1
    @hedgie You can't make a truly portable script that prints non-ASCII characters. There are too many terminals that don't support it. – bobince Aug 30 '16 at 07:56