Before getting into the problem, I would like to inform that I saw a lot of StackOverflow questions and python bugs reported on this problem but I am unable to root cause the issue
I am getting UnicodeEncodingError in a centos machine. Python is not built in the machine but the virtual environment with the required python version (3.6.7) is built somewhere else and copied here. So while starting the server, we activate the virtual environment and start the server.
the issue is observed in two scenarios
- logging input request parameter which has Unicode characters in it
- we pipe print statements to a log file and i can see error there while trying to print this Unicode string through code
the error looks as follows
print("\u6211\u7684\u7535\u8111\u603b\u662f\u51fa\u73b0Windows\u9700\u8981\u6fc0\u6d3b")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 56-63: ordinal not in range(128)
I verified following through python terminal
- sys.getdefaultencoding() - utf-8
- sys.getfilesystemencoding() - utf-8
- sys.stdout.encoding
- LANG is set to en_us.utf-8
- LC_ALL is not set
I went through some solutions asking to modify LC_ALL or adding PYTHONIOENCODING in environment variables but I am not sure about modifying those without knowing side effects as the environment is a production environment.
Edit - I tried to print the same set of characters which are breaking the code on above attempts through console by opening python terminal and its printing them without any issue. Tried printing in this way
import sys
print("日本語")
sys.stdout.write("日本語\n")
but through code, it is raising UnicodeEncodingError
I would like to know how to resolve this?
Thanks