The question
I have the following simple script:
test.py
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.info("€")
Depending on the context how this script is called it produces the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 10: ordinal not in range(128)
Why is it doing this and what can I do to fix it?
What I have already found out
Observation
When I call this script "normally" it is no problem:
$ python3 test.py
INFO:root:€
However when I create a PHP file /var/www/html/test.php:
<?php
echo "# locale\n\n";
passthru("locale");
echo "\n\n# python\n\n";
passthru("python3 /var/www/html/test.py 2>&1");
and then call this file via Apache, it get the error:
$ curl localhost/test.php
# locale
LANG=C
LANGUAGE=de_DE.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
# python
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 10: ordinal not in range(128)
Call stack:
File "/var/www/html/test.py", line 5, in <module>
logging.info("\u20ac")
Message: '\u20ac'
Arguments: ()
For comparison this is what I get if I call locale
directly:
$ locale
LANG=de_DE.UTF-8
LANGUAGE=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=de_DE.UTF-8
If I change my passthru
-call in PHP to the following:
passthru("LANG=de_DE.UTF-8 python3 /var/www/html/openWB/test.py 2>&1");
then everything works fine.
Where is LANG=C
coming from? Not from here:
cat /etc/default/locale
# File generated by update-locale
LANG=de_DE.UTF-8
LC_ALL=de_DE.UTF-8
LANGUAGE=de_DE.UTF-8
System is Raspbian GNU/Linux 9 (stretch).
Interpretation
Obviously the scripts success is dependent on the settings of my user. I used to think that python scripts are mostly portable across systems. Now I learned they are not even portable from one user to another ;-). Of course it is fine that environment variables change the appearance of the application in question, however it is not so fine that it is guaranteed to break the whole application.
I assume I either need to change my Python script to force UTF-8 (not quite sure why this is not the default behavior if anything else fails anyway) or I need to set the LANG variable for the PHP scripts. For both options the question is: What is the simplest/shortest/most effective way to do that? In best case there is a single option I change to globally fix this for the whole systems. Root access is available.
Note that I am currently stuck on Python 3.5.3 and cannot easily upgrade.