1

Using python 3, running the following code

print("some box drawing:")
print("┌─┬┼┴┐")

via

py my_app.py

prints

some box drawing:
┌─┬┼┴┐

As you would expect.

However, if you redirect this (either Windows or Linux) with

py my_app.py > redirected.txt

you get the following exception:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

As has been suggested in many other posts, this exception can be "fixed" by calling sys.stdout.reconfigure(encoding='utf-8') prior to printing. On linux and in the windows cmd, thats it, problem solved. Using PowerShell on Windows however, the output looks like this:

some box drawing:
ΓöîΓöÇΓö¼Γö╝Γö┤ΓöÉ

Which is especially odd, since it works fine using the cmd.exe console.

The code base is delivered to a customer as an executable and I would like to not ask them to execute something in the console in order for my program to work reliably. Is there a programmatic way to have box drawing characters written correctly when redirecting output to a file using the windows PowerShell?

Neuron
  • 5,141
  • 5
  • 38
  • 59

2 Answers2

0

From this answer, I have learned, that redirecting in the PowerShell to utf-8 simply does not work, but utf-16 does. Executing the following code on startup worked for me with/without redirect and in a number of different consoles:

import os
import sys

is_redirected = not sys.stdout.isatty()
if is_redirected:
    is_power_shell = len(os.getenv('PSModulePath', '').split(os.pathsep)) >= 3
    if is_power_shell:
        sys.stdout.reconfigure(encoding='utf-16')
    else:
        sys.stdout.reconfigure(encoding='utf-8')

I decided to only set the encoding when running a redirect and only to utf-16 when in the PowerShell as I wanted to avoid running into other unforeseen encoding problems with other setups: The snippet that detects the power shell from is taken from this answer and the snippet for detecting a redirect from this answer.

I myself find this solution a little messy. If you find a better solution, I am happy to accept it.

Neuron
  • 5,141
  • 5
  • 38
  • 59
0

When running Python directly in Windows, it internally uses Unicode APIs to write to the cmd window, and doesn't care what the console encoding is set to, but when redirecting to a file it doesn't know. That's why you can use sys.stdout.reconfigure to tell it.

Python also has an environment variable PYTHONIOENCODING which can tell it the encoding to use as well. chcp is the shell command that will tell you what the terminal expects.

Example:

C:\tmp>chcp
Active code page: 437                     # Legacy U.S. DOS encoding

C:\tmp>py -c "print('┌─┬┼┴┐')"            # this uses Unicode APIs
┌─┬┼┴┐

C:\tmp>py -c "print('┌─┬┼┴┐')" >x         # this uses an OS-specific default encoding
Traceback (most recent call last):        # Windows-1252 on U.S. Windows.
  File "<string>", line 1, in <module>
  File "D:\dev\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

C:\tmp>set PYTHONIOENCODING=cp437         # code page 437 supports box drawing characters

C:\tmp>py -c "print('┌─┬┼┴┐')" >x         # file is written encoded in cp437

C:\tmp>type x                             # matches terminal encoding and displays correctly
┌─┬┼┴┐

C:\tmp>chcp 65001                         # UTF-8 code page
Active code page: 65001

C:\tmp>type x                             # cp437 doesn't decode properly
������

C:\tmp>set PYTHONIOENCODING=utf8          # Use UTF8

C:\tmp>py -c "print('┌─┬┼┴┐')" >x         # write file encoded in UTF8

C:\tmp>type x                             # matches terminal code page now
┌─┬┼┴┐
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • thanks for taking the time to write an answer. unfortunately it does not answer my question. I know about `sys.stdout.reconfigure` - as I have described in my question. but it does not fix the problem when using the PowerShell. also, as described in the question, I am looking for a solution that works programmatically, so setting environment variables is not an option – Neuron Apr 06 '23 at 17:37
  • @Neuron Use a batch file to set the variable "programmatically" and run Python. – Mark Tolonen Apr 06 '23 at 17:48
  • Even then, setting the encoding to utf8 does not solve the problem. as I have stated in my question – Neuron Apr 06 '23 at 17:54
  • @Neuron given the constraints you've placed the problem may not be solvable. But I'd try using `reconfigure` with `cp437` instead of `utf-8`. – Mark Ransom Apr 06 '23 at 17:54
  • 1
    You have to set it to the encoding of the shell, if you want it to display in the shell correctly. That goes for cmd as well as powershell. If the cmd default isn't UTF-8 it won't display correctly either. I'm using cmd above. I don't really see how to get around that. If the user is redirecting the output, they have to choose the correct encoding. The only real way to avoid it is stick to ASCII characters since they are encoded the same in nearly all encodings. – Mark Tolonen Apr 06 '23 at 17:54
  • Another operation is to always force UTF8, but document to the user to use an editor that understands UTF8 or set the shell to UTF8, e.g. `chcp 65001`. – Mark Tolonen Apr 06 '23 at 18:01
  • @MarkRansom It is possible. I have since posted my own answer to the question – Neuron Apr 06 '23 at 19:35
  • @Neuron what are you using to display the file? Perhaps it automatically recognizes UTF-16 but not UTF-8. I don't trust that your answer will work in all circumstances. – Mark Ransom Apr 06 '23 at 19:52
  • @MarkRansom I have checked in notepad++ and manually selected both utf-8 and utf-16. It correctly displays it with both. Like I said in my answer, its not an elegant solution and a little hacky, but it is the best I have so far – Neuron Apr 06 '23 at 20:40