3

The question

I have the following simple script:

test.py

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.info("€")

Depending on the context how this script is called it produces the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 10: ordinal not in range(128)

Why is it doing this and what can I do to fix it?

What I have already found out

Observation

When I call this script "normally" it is no problem:

$ python3 test.py 
INFO:root:€

However when I create a PHP file /var/www/html/test.php:

<?php
echo "# locale\n\n";
passthru("locale");
echo "\n\n# python\n\n";
passthru("python3 /var/www/html/test.py 2>&1");

and then call this file via Apache, it get the error:

$ curl localhost/test.php
# locale

LANG=C
LANGUAGE=de_DE.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=


# python

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 10: ordinal not in range(128)
Call stack:
  File "/var/www/html/test.py", line 5, in <module>
    logging.info("\u20ac")
Message: '\u20ac'
Arguments: ()

For comparison this is what I get if I call locale directly:

$ locale
LANG=de_DE.UTF-8
LANGUAGE=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=de_DE.UTF-8

If I change my passthru-call in PHP to the following:

passthru("LANG=de_DE.UTF-8 python3 /var/www/html/openWB/test.py 2>&1");

then everything works fine.

Where is LANG=C coming from? Not from here:

cat /etc/default/locale
#  File generated by update-locale
LANG=de_DE.UTF-8
LC_ALL=de_DE.UTF-8
LANGUAGE=de_DE.UTF-8

System is Raspbian GNU/Linux 9 (stretch).

Interpretation

Obviously the scripts success is dependent on the settings of my user. I used to think that python scripts are mostly portable across systems. Now I learned they are not even portable from one user to another ;-). Of course it is fine that environment variables change the appearance of the application in question, however it is not so fine that it is guaranteed to break the whole application.

I assume I either need to change my Python script to force UTF-8 (not quite sure why this is not the default behavior if anything else fails anyway) or I need to set the LANG variable for the PHP scripts. For both options the question is: What is the simplest/shortest/most effective way to do that? In best case there is a single option I change to globally fix this for the whole systems. Root access is available.

Note that I am currently stuck on Python 3.5.3 and cannot easily upgrade.

yankee
  • 38,872
  • 15
  • 103
  • 162

2 Answers2

3

If you are unable to upgrade to python3.7+ (where a UTF-8 mode by default is available even with LANG=C):

$ LANG=C python3.7 script.py
INFO:root:€

and unable to control the environment when the script is called (where you could set LANG=C.UTF-8 (an agnostic UTF-8 locale) or some other more-specific language-country pair locale):

$ LANG=C.UTF-8 python3.5 script.py
INFO:root:€

and can't force the encoding via PYTHONIOENCODING:

$ LANG=C PYTHONIOENCODING=UTF-8 python3.5 script.py
INFO:root:€

then you've got a few gross choices like re-opening the standard streams:

$ cat script.py
import locale
import logging
import sys

sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
sys.stderr = open(sys.stderr.fileno(), mode='w', encoding='utf8', buffering=1)

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.info("€")
$ LANG=C python3.5 script.py
INFO:root:€

or some sort of redirection / re-exec with the proper environment / locale (though of course, exec isn't portable and will only work reasonably well on posixlikes and does not work at all on windows):

import os
import locale
import logging
import sys

if os.getenv('PYTHONIOENCODING') != 'UTF-8':
    cmd = [sys.executable, *sys.argv]
    os.execvpe(cmd[0], cmd, {**os.environ, 'PYTHONIOENCODING': 'UTF-8'})

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.info("€")
$ LANG=C python3.5 script.py
INFO:root:€
anthony sottile
  • 61,815
  • 15
  • 148
  • 207
  • `[..]and unable to control the environment when the script is called[..]` I am not necessarily unable. I do have root access to the system in question. I am just unaware where "LANG=C" is coming from. Maybe I can fix that... I would prefer such a "global" solution, because multiple python scripts are affected and if I fix each call individually there is the danger of forgetting one. – yankee Jul 31 '22 at 16:14
  • I edited my question to be more specific and added contents of /etc/default/locale. – yankee Jul 31 '22 at 16:45
0

As indicated in this answer, the LANG environment variable used by Apache is set in /etc/apache2/envvars. The file contains these lines:

## The locale used by some modules like mod_dav
export LANG=C
## Uncomment the following line to use the system default locale instead:
#. /etc/default/locale

export LANG

The default value is C, but, by uncommenting the mentioned line, you can use the system locale instead.

Once it's done, the Python script will inherit the correct locale.

Olivier
  • 13,283
  • 1
  • 8
  • 24