I have two programs, the first produces UTF-8 encoded Unicode bytes on its standard output, and the second accepts UTF-8 encoded bytes on standard input. I would like to pipe the output of the first into the input of the second.
In cmd, this works as I'd expect (I'm using Python with -x utf8
to give a reproducible example):
>py -X utf8 -c "print('\N{Snowman}\N{Pile of poo}')" | py -X utf8 -c "import sys; print(sys.stdin.read())"
☃
In Powershell Core, though, the data gets mangled:
PS>py -X utf8 -c "print('\N{Snowman}\N{Pile of poo}')" | py -X utf8 -c "import sys; print(sys.stdin.read())"
ÔÿâƒÆ®
Altering the codepage to 65001 using chcp
doesn't have any effect on this. This is Powershell Core 7.0.0-rc3, and $OutputEncoding is set to UTF-8 (the default, I assume, as I didn't change it).
Powershell for Windows gives ???????
by default - presumably this is because $OutputEncoding is ASCII - changing it to UTF-8 gives the same behaviour as Powershell Core.
Two (related) questions:
- How do I get the same behaviour as cmd in Powershell?
- What exactly is going on when I use
|
between two native programs like this?
I found Changing PowerShell's default output encoding to UTF-8, which is very helpful but refers to >
/ >>
and Out-File
, not |
.
I also found https://gist.github.com/xoner/4671514, which suggests [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
. This seems to fix the issue, but I have no idea why - so my second question above (what is going on) very much still applies. Also, I'd be interested in knowing if there are any downsides to putting [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
in my profile, given that it seems to fix this issue... (For example, what if my two programs were producing bytes that weren't UTF-8 encoded text?)