I've got a simple python script, foo.py
, that outputs a unicode character:
import sys
print(sys.stdout.encoding)
print(b'\xe2\x96\x88'.decode('utf8'))
I want to run it in powershell and pipe the output to Write-Host:
PS> c:\python37\python.exe foo.py | Write-Host
If I do this, the result is:
Traceback (most recent call last):
File ".\pyen.py", line 3, in <module>
print(b'\xe2\x96\x88'.decode('utf8'))
File "C:\python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 0: character maps to <undefined>
cp1252
It turns out this isn't even a Write-Host problem. Just assigning the output to a variable, or piping it to Out-Null, give the same error:
PS> c:\python37\python.exe foo.py | Out-Null #Same error
PS> $a = c:\python37\python.exe foo.py #Same error
PS> c:\python37\python.exe foo.py #No error, stdout encoding is printed as utf-8
I've gone down the rabbit hole of why this is happening. Powershell picks the default windows codepage (cp1252) for many things.
This answer offers a couple solutions: Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)
Unfortunately, changing my $PROFILE
to set the input and output encoding doesn't help.
The more permanent solution in that answer of enabling utf-8 systemwide does fix this, but that is a beta feature and can break other things, so I'd rather not go down that road.
I've also played with setting the python environment variable for encoding or modifying the python source, but these aren't great answers either, as that means tweaking or altering any python code whose output I want to pipe to Write-Host.
Any ideas?