I have googled this and tried every single solution that I have found and nothing is working. I am using Python3. I read a string from a form and try to print/write it. Everything is fine unless it contains non-ascii characters (I am testing with Greek text).
form = cgi.FieldStorage()
name = form.getvalue("Name")
sys.stderr.write(name)
print(name)
The write outputs the Unicode encoding (e.g. \u03bc\u03b5\u03c4\u1f70
) which is not what I want, and the print crashes with a
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
So for some reason print and write treat it differently, which is just weird.
Here is everything I have tried to get it to print out the text in its original form (as Greek letters):
print(name.encode("UTF-8"))
This prints it in the wrong format similar to what write did above. The following all crash with the same/similar error:
print(name.encode("UTF-8").decode("UTF-8")) # crashes with same error
ba = bytearray(name,"UTF-8")
n2 = ba.decode("UTF-8")
print(n2) # also crashes
unic = u'' # Nope. Errors still.
unic +=name
print(unic) # also crashes
print(b'{name}') #Prints b'{name}' literally.
If I run similar code locally (instead of on a webserver and getting the string as a response), everything works fine. Somehow the string I am getting back is acting differently and I cannot for the life of me figure out why.
So what very simple thing am I missing here?
In case it is relevant, I executed locale
(I am using Centos 7) and get the following:
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Update
I printed sys.stdout.encoding
from the script and it returns ANSI_X3.4-1968
, so that may be the problem. Strangely, when I run the same command from a python3 command line prompt I get UTF-8
. Now I I guess I need to figure out how to set the encoding for when it runs from the webserver.
Update 2 I added the following:
A = subprocess.run(["locale"])
print(A.stdout)
And the output is:
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL= None
so clearly the encoding is set differently when I run from Apache than from the command line. Hmmm...
Update 3
I tried adding the following lines to /etc/sysconfig/httpd
and restarted apache, but no change. (The first two were suggested in one place, the third in another, although none of the sources said WHERE to put these. For Centos 7, the file I tried seemed to be the logical one, but obviously not?)
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
export PYTHONIOENCODING=utf-8
Update 4
I tried locale.setlocale(locale.LC_ALL,'en_US.UTF-8')
in my script and it didn't help.
Also, strangely, there is the following in my old error_log
files for httpd, but not in the current one (so the last time this printed was several days ago):
Fatal Python error: Py_Initialize: Unable to get the locale encoding
This seems to track with what I am seeing--the environment variables are not being used/seen when my scripts run from Apache.
Update 5
I found a hack that works. Instead of running my python script directly from Apache, I run runMakeArt
which is as follows:
#!/usr/bin/sh
export PYTHONIOENCODING=utf-8 ; /usr/bin/python3 makeArt.py
and so far this seems to be working. In some sense maybe this is better than properly configuring Apache since if I move servers (hopefully not!), this should still work without worrying about whether or not Apache is configured correctly.