I have a native program written in Python that expects its input on stdin. As a simple example,
#!python3
import sys
with open('foo.txt', encoding='utf8') as f:
f.write(sys.stdin.read())
I want to be able to pass a (PowerShell) string to this program as standard input. Python expects its standard input in the encoding specified in $env:PYTHONIOENCODING
, which I will typically set to UTF8
(so that I don't get any encoding errors).
But no matter what I do, characters get corrupted. I've searched the net and found suggestions to change [Console]::InputEncoding
/[Console]::OutputEncoding
, or to use chcp
, but nothing seems to work.
Here's my basic test:
PS >[Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
PS >[Console]::InputEncoding.EncodingName
Unicode (UTF-8)
PS >$env:PYTHONIOENCODING
utf-8
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
´╗┐?
PS >chcp 1252
Active code page: 1252
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
PS >chcp 65001
Active code page: 65001
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?
How can I fix this problem?
I can't even explain what's going on here. Basically, I want the test (python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
) to print out a Euro sign. And to understand why, I have to do whatever is needed to get that to work :-) (Because then I can translate that knowledge to my real scenario, which is to be able to write working pipelines of Python programs that don't break when they encounter Unicode characters).