1

I have two programs, the first produces UTF-8 encoded Unicode bytes on its standard output, and the second accepts UTF-8 encoded bytes on standard input. I would like to pipe the output of the first into the input of the second.

In cmd, this works as I'd expect (I'm using Python with -x utf8 to give a reproducible example):

>py -X utf8 -c "print('\N{Snowman}\N{Pile of poo}')" | py -X utf8 -c "import sys; print(sys.stdin.read())"
☃

In Powershell Core, though, the data gets mangled:

PS>py -X utf8 -c "print('\N{Snowman}\N{Pile of poo}')" | py -X utf8 -c "import sys; print(sys.stdin.read())"
Ôÿ⭃Ʈ

Altering the codepage to 65001 using chcp doesn't have any effect on this. This is Powershell Core 7.0.0-rc3, and $OutputEncoding is set to UTF-8 (the default, I assume, as I didn't change it).

Powershell for Windows gives ??????? by default - presumably this is because $OutputEncoding is ASCII - changing it to UTF-8 gives the same behaviour as Powershell Core.

Two (related) questions:

  1. How do I get the same behaviour as cmd in Powershell?
  2. What exactly is going on when I use | between two native programs like this?

I found Changing PowerShell's default output encoding to UTF-8, which is very helpful but refers to > / >> and Out-File, not |.

I also found https://gist.github.com/xoner/4671514, which suggests [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8. This seems to fix the issue, but I have no idea why - so my second question above (what is going on) very much still applies. Also, I'd be interested in knowing if there are any downsides to putting [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8 in my profile, given that it seems to fix this issue... (For example, what if my two programs were producing bytes that weren't UTF-8 encoded text?)

Paul Moore
  • 6,569
  • 6
  • 40
  • 47
  • 1
    `$OutputEncoding` determines the encoding that PowerShell sends to external programs. `[console]::OutputEncoding` determines the encoding PowerShell receives from external programs. Windows PowerShell defaults to ASCII for both. PowerShell Core defaults to UTF-8 with no BOM for both. In Windows PowerShell, `Out-File` uses Unicode UTF-16le I believe. Keep in mind that `[Text.UTF8Encoding]::UTF8` is UTF8 with BOM in Windows Powershell. I don't see downsides to changing your encoding, personally, but you should change both `$OutputEncoding` and `[Console]::OutputEncoding` to match, IMO. – AdminOfThings Mar 03 '20 at 22:26
  • 1
    Some default is influenced by the active code page of the system. I can never remember them because different native Powershell commands behave differently. Windows Powershell encoding is a cluster, negatively speaking. Core did it better and more consistently. – AdminOfThings Mar 03 '20 at 22:32
  • I hope [this answer](https://stackoverflow.com/a/59118502/45375) to the linked question addresses your questions; let us know if it doesn't. – mklement0 Mar 04 '20 at 03:17
  • @AdminOfThings thanks - from my experiments, `[Console]::OutputEncoding` in PowerShell Core doesn't use UTF-8, but uses the default OEM codepage. Given that these relate to the two halves of a pipe, would it not count as a bug in powershell that the default values aren't the same? – Paul Moore Mar 05 '20 at 08:35
  • 1
    @mklement0 Thanks, that does seem to cover it, I'll read the full answer there and see if there's anything further I still don't understand, but it looks pretty comprehensive. It's annoying that Powershell doesn't support "proper" piping between native programs, but I guess that's just how it is. I might raise a feature request for it, see what happens to that... – Paul Moore Mar 05 '20 at 08:39

0 Answers0