3

When I run into unicode printing problems, I want to know what I should check. In my particular case, I'm using an installed module that is printing unicode encoded characters using the wrong codec.

There are several disparate places that affect python encoding and decoding under a variety of circumstances. And specifically how python handles printable data in different circumstances.

Some things off the top of mind:

  • general environment variables LC_ALL, LANG
  • Python sys module setting sys.getdefaultencoding()

What else am I forgetting?


I'm only interested in python 3.

JamesThomasMoon
  • 6,169
  • 7
  • 37
  • 63

1 Answers1

5

things to check

Here is what I found, in order of how I recommend checking them:

  • environment variables LC_ALL, LANG, LC_CTYPE, LANGUAGE
  • Python-specific environment variables PYTHONIOENCODING, PYTHONCOERCECLOCALE
    (the affect of which may be affected by Python interpreter argument -E; check sys.flags.ignore_environment)
  • Windows-specific console encoding PYTHONLEGACYWINDOWSSTDIO
  • Python sys module
    • function sys.getdefaultencoding()
      (the corollary function sys.setdefaultencoding was removed from Python 3)
    • sys.stdin.encoding
    • sys.stdout.encoding
    • sys.stderr.encoding
  • file system encoding setting sys.getfilesystemencoding()
  • Python file header coding:, as in
    # -*- coding: utf-8 -*-
    
    effects parser interpretation of built-in strings.
  • locale module
    • function call locale.nl_langinfo(locale.CODESET)
      (does not appear to work on Windows Python 3.7, worked on Debian Python 3.5)
    • function locale.getdefaultlocale
    • function locale.getpreferredencoding
      (works differently on some systems)
  • gettext module and it's various facilities (too many to list all of them)
  • contents of the directories passed to some functions like gettext.install(application, directory) or gettext.bindtextdomain(domain, directory)


print the values

Here is a short script to list the values of most of these:

#!/usr/bin/env python3
#
# print various locale information

import locale
import os
import sys


def main():

    print("Python:")
    print("  version:", sys.version.replace("\n", " "))

    print("environment:")
    for env in (
        "LC_ALL",
        "LANG",
        "LC_CTYPE",
        "LANGUAGE",
        "PYTHONUTF8",
        "PYTHONIOENCODING",
        "PYTHONLEGACYWINDOWSSTDIO",
        "PYTHONCOERCECLOCALE",
    ):
        if env in os.environ:
            print("  \"%s\"=\"%s\"" % (env, os.environ[env]))
        else:
            print("  \"%s\" not set" % env)
    print("  -E (ignore PYTHON* environment variables) ?", bool(sys.flags.ignore_environment))

    print()
    print("sys module:")
    print("  sys.getdefaultencoding() \"%s\"" % sys.getdefaultencoding())
    print("  sys.stdin.encoding \"%s\"" % sys.stdin.encoding)
    print("  sys.stdout.encoding \"%s\"" % sys.stdout.encoding)
    print("  sys.stderr.encoding \"%s\"" % sys.stderr.encoding)
    print("  sys.getfilesystemencoding() \"%s\"" % sys.getfilesystemencoding())

    print()
    print("locale module:")
    if hasattr(locale, "nl_langinfo"):
        print("  locale.nl_langinfo(locale.CODESET) \"%s\""
            % locale.nl_langinfo(locale.CODESET))
    else:
        print("  locale.nl_langinfo not available")

    try:
        print("  locale.getencoding() \"%s\"" % locale.getencoding())
    except AttributeError:
        print("  locale.getencoding() not available")

    try:
        print("  locale.getlocale()", (locale.getlocale(),))
    except AttributeError:
        print("  locale.getlocale() not available")

    try:
        print("  locale.getpreferredencoding() \"%s\""
            % locale.getpreferredencoding())
    except AttributeError:
        print("  locale.getpreferredencoding() not available")

    try:
        print("  locale.getdefaultlocale()[1] \"%s\""
            % locale.getdefaultlocale()[1])
    except AttributeError:
        print("  locale.getdefaultlocale() not available")

if __name__ == "__main__":
    main()


printed values on three systems

  • Windows 10 with 3.7
  • Debian 9 with 3.5
  • Ubuntu 14 with 3.4

On Windows 10 using Python 3.7 within built-in PowerShell terminal, this prints

PS> python.exe print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG" not set
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING"="UTF-8"
"PYTHONLEGACYWINDOWSSTDIO" not set

sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"

locale:
locale.nl_langinfo not available
locale.getdefaultlocale()[1] "cp1252"
locale.ngetpreferredencoding() "cp1252"

On Debian 9 using Python 3.5, this prints

$ python print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_GB.UTF-8"
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set

sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"

locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.ngetpreferredencoding() "UTF-8"

On Ubuntu 14.04 using Python 3.4, this prints

$ python print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set                                                                                                        
"LANG"="en_US.UTF-8"                                                                                                    
"LC_CTYPE" not set                                                                                                      
"LANGUAGE"="en_US:"                                                                                                     
"PYTHONIOENCODING" not set                                                                                              
"PYTHONLEGACYWINDOWSSTDIO" not set                                                                                      
                                                                                                                        
sys module:                                                                                                             
getdefaultencoding "utf-8"                                                                                              
sys.stdin.encoding "UTF-8"                                                                                              
sys.stdout.encoding "UTF-8"                                                                                             
sys.stderr.encoding "UTF-8"                                                                                             
                                                                                                                        
locale:                                                                                                                 
locale.nl_langinfo(locale.CODESET) "UTF-8"                                                                              
locale.getdefaultlocale()[1] "UTF-8"                                                                                    
locale.getpreferredencoding() "UTF-8"   


Unfortunately, when I run into unicode print problems with installed modules, it is not immediately obvious which setting is affecting that module. Doubly so, understanding how these different possible parameters and settings interact is all the more confounding. There are many combinations of settings to test.

But this little bit might help someone get started.

Also see helpful answers at SO Question How to set sys.stdout encoding in Python 3?.



Related PEPs to review


Some help from this pymotw article, python how-to unicode, python sys module, python locale module.

JamesThomasMoon
  • 6,169
  • 7
  • 37
  • 63