0

Although Python 3.8 should use UTF-8 also on Windows (PEP-0528 PEP-0529), one still gets

UnicodeEncodeError: 'charmap' codec can't encode character '\u251c' in position 0: character maps to <undefined>

The exception happens in cp1252.py.

Example code (t.py):

print(b'\xe2\x94\x9c'.decode('utf-8'))
print(b'\xe2\x94\x94'.decode('utf-8'))
print(b'\xe2\x94\x80'.decode('utf-8'))
print(b'\xe2\x94\x82'.decode('utf-8'))

It does not happen with python t.py, but it happens when piping

python t.py | python -c "import sys; print(sys.stdin.read())"

or forwarding to a file (python t.py > t.txt).

Roland Puntaier
  • 3,250
  • 30
  • 35

1 Answers1

0

Adding

import sys
import codecs
try:
    sys.stdin = codecs.getreader("utf-8")(sys.stdin.detach())
    sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
except:
    pass

print(b'\xe2\x94\x9c'.decode('utf-8'))

from one of the answers further down in this post helps:

Python, Unicode, and the Windows console

The try-except is useful when using py.test, as there is a conflict with py.test capsys, that makes replacing sys.stdout fail.

Roland Puntaier
  • 3,250
  • 30
  • 35
  • 1
    For redirected standard I/O, you can override the ANSI default via the `PYTHONIOENCODING` environment variable (see the [docs](https://docs.python.org/3/using/cmdline.html#environment-variables)). More radically, you can set the `PYTHONUTF8` environment variable for 3.7+. This forces `locale.getpreferredencoding()` to UTF-8, which overrides the default for all files. – Eryk Sun Nov 20 '19 at 18:30
  • Thanks for the info. One has no influence on the environment of a package user, though. – Roland Puntaier Nov 21 '19 at 09:16
  • A package shouldn't modify the standard streams, unless it's a voluntary, opt-in configuration. Python's documented behavior is to use `locale.getpreferredencoding()` when opening files, unless overridden by `PYTHONIOENCODING` for stdin, stdout, and stderr. In Windows, this is the system ANSI codepage. An exception is made for the Windows console. By default, Python 3.6+ uses the console's UTF-16 API and presents this to scripts as a UTF-8 file. But if `PYTHONLEGACYWINDOWSSTDIO` is defined, it uses the legacy console codepage instead (OEM, typically), as do Python versions prior to 3.6. – Eryk Sun Nov 21 '19 at 09:42