things to check
Here is what I found, in order of how I recommend checking them:
- environment variables
LC_ALL
, LANG
, LC_CTYPE
, LANGUAGE
- Python-specific environment variables
PYTHONIOENCODING
, PYTHONCOERCECLOCALE
(the affect of which may be affected by Python interpreter argument -E
; check sys.flags.ignore_environment
)
- Windows-specific console encoding
PYTHONLEGACYWINDOWSSTDIO
- Python
sys
module
- function
sys.getdefaultencoding()
(the corollary function sys.setdefaultencoding
was removed from Python 3)
sys.stdin.encoding
sys.stdout.encoding
sys.stderr.encoding
- file system encoding setting
sys.getfilesystemencoding()
- Python file header
coding:
, as in
# -*- coding: utf-8 -*-
effects parser interpretation of built-in strings.
locale
module
- function call
locale.nl_langinfo(locale.CODESET)
(does not appear to work on Windows Python 3.7, worked on Debian Python 3.5)
- function
locale.getdefaultlocale
- function
locale.getpreferredencoding
(works differently on some systems)
gettext
module and it's various facilities (too many to list all of them)
- contents of the directories passed to some functions like
gettext.install(application, directory)
or gettext.bindtextdomain(domain, directory)
print the values
Here is a short script to list the values of most of these:
#!/usr/bin/env python3
#
# print various locale information
import locale
import os
import sys
def main():
print("Python:")
print(" version:", sys.version.replace("\n", " "))
print("environment:")
for env in (
"LC_ALL",
"LANG",
"LC_CTYPE",
"LANGUAGE",
"PYTHONUTF8",
"PYTHONIOENCODING",
"PYTHONLEGACYWINDOWSSTDIO",
"PYTHONCOERCECLOCALE",
):
if env in os.environ:
print(" \"%s\"=\"%s\"" % (env, os.environ[env]))
else:
print(" \"%s\" not set" % env)
print(" -E (ignore PYTHON* environment variables) ?", bool(sys.flags.ignore_environment))
print()
print("sys module:")
print(" sys.getdefaultencoding() \"%s\"" % sys.getdefaultencoding())
print(" sys.stdin.encoding \"%s\"" % sys.stdin.encoding)
print(" sys.stdout.encoding \"%s\"" % sys.stdout.encoding)
print(" sys.stderr.encoding \"%s\"" % sys.stderr.encoding)
print(" sys.getfilesystemencoding() \"%s\"" % sys.getfilesystemencoding())
print()
print("locale module:")
if hasattr(locale, "nl_langinfo"):
print(" locale.nl_langinfo(locale.CODESET) \"%s\""
% locale.nl_langinfo(locale.CODESET))
else:
print(" locale.nl_langinfo not available")
try:
print(" locale.getencoding() \"%s\"" % locale.getencoding())
except AttributeError:
print(" locale.getencoding() not available")
try:
print(" locale.getlocale()", (locale.getlocale(),))
except AttributeError:
print(" locale.getlocale() not available")
try:
print(" locale.getpreferredencoding() \"%s\""
% locale.getpreferredencoding())
except AttributeError:
print(" locale.getpreferredencoding() not available")
try:
print(" locale.getdefaultlocale()[1] \"%s\""
% locale.getdefaultlocale()[1])
except AttributeError:
print(" locale.getdefaultlocale() not available")
if __name__ == "__main__":
main()
printed values on three systems
- Windows 10 with 3.7
- Debian 9 with 3.5
- Ubuntu 14 with 3.4
On Windows 10 using Python 3.7 within built-in PowerShell terminal, this prints
PS> python.exe print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG" not set
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING"="UTF-8"
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo not available
locale.getdefaultlocale()[1] "cp1252"
locale.ngetpreferredencoding() "cp1252"
On Debian 9 using Python 3.5, this prints
$ python print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_GB.UTF-8"
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.ngetpreferredencoding() "UTF-8"
On Ubuntu 14.04 using Python 3.4, this prints
$ python print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_US.UTF-8"
"LC_CTYPE" not set
"LANGUAGE"="en_US:"
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.getpreferredencoding() "UTF-8"
Unfortunately, when I run into unicode print problems with installed modules, it is not immediately obvious which setting is affecting that module. Doubly so, understanding how these different possible parameters and settings interact is all the more confounding. There are many combinations of settings to test.
But this little bit might help someone get started.
Also see helpful answers at SO Question How to set sys.stdout encoding in Python 3?.
Related PEPs to review
Some help from this pymotw article, python how-to unicode, python sys module, python locale module.