tl;dr
Use the $OutputEncoding
preference variable:
# Using the system's legacy ANSI code page, as Python does by default.
# NOTE: The & { ... } enclosure isn't strictly necessary, but
# ensures that the $OutputEncoding change is only temporary,
# by limiting to the child scope that the enclosure cretes.
& {
$OutputEncoding = [System.Text.Encoding]::Default
"‘I am well’ he said." | python -c 'import sys; print(sys.stdin.read())'
}
# Using UTF-8 instead, which is generally preferable.
# Note the `-X utf8` option (Python 3.7+)
& {
$OutputEncoding = [System.Text.UTF8Encoding]::new()
"‘I am well’ he said." | python -X utf8 -c 'import sys; print(sys.stdin.read())'
}
# Using the system's legacy ANSI code page, as Python does by default.
# Note: In PowerShell (Core) / .NET 5+,
# [System.Text.Encoding]::Default` now reports UTF-8,
# not the active ANSI encoding.
& {
$OutputEncoding = [System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage)
"‘I am well’ he said." | python -c 'import sys; print(sys.stdin.read())'
}
# Using UTF-8 instead, which is generally preferable.
# Note the `-X utf8` option (Python 3.7+)
# NO need to set $OutputEncoding, as it now *defaults* to UTF-8
"‘I am well’ he said." | python -X utf8 -c 'import sys; print(sys.stdin.read())'
Note:
$OutputEncoding
controls what encoding is used to send data TO external programs via the pipeline (to stdin). It defaults to ASCII(!) in Windows PowerShell, and UTF-8 in PowerShell (Core).
[Console]::OutputEncoding
controls how data received FROM external programs (via stdout) is decoded. It defaults to the console's active code page, which in turn defaults to the system's legacy OEM code page, such as 437
on US-English systems).
That these two encodings are not aligned by default is unfortunate; while Windows PowerShell will see no more changes, there is hope for PowerShell (Core): it would make sense to have it default consistently to UTF-8:
GitHub issue #7233 suggests at least defaulting the shortcut files that launch PowerShell to UTF-8 (code page 65001
); GitHub issue #14945 more generally discusses the problematic mismatch.
In Windows 10 and above, there is an option to switch to UTF-8 system-wide, which then makes both the OEM and ANSI code pages default to UTF-8 (65001
); however, this has far-reaching consequences and is still labeled as being in beta as of Windows 11 - see this answer.
Background information:
It is the $OutputEncoding
preference variable that determines what character encoding PowerShell uses to send data (invariably text, as of PowerShell 7.3) to an external program via the pipeline.
Note that this even applies when data is read from a file: PowerShell, as of v7.3, never sends raw bytes through the pipeline: it reads the content into .NET strings first and then re-encodes them based on $OutputEncoding
on sending them through the pipeline to an external program.
Therefore, what encoding your ansi.txt
input file uses is ultimately irrelevant, as long as PowerShell decodes it correctly when reading it into .NET strings (which are internally composed of UTF-16 code units).
See this answer for more information.
Thus, the character encoding stored in $OutputEncoding
must match the encoding that the target program expects.
By default the encoding in $OutputEncoding
is unrelated to the encoding implied by the console's active code page (which itself defaults to the system's legacy OEM code page, such as 437
on US-English systems), which is what at least legacy console applications tend to use; however, Python does not, and uses the legacy ANSI code page; other modern CLIs, notably NodeJS' node.exe
, always use UTF-8.
While $OutputEncoding
's default in PowerShell (Core) 7+ is now UTF-8, Windows PowerShell's default is, regrettably, ASCII(!), which means that non-ASCII characters get "lossily" transliterated to verbatim ASCII ?
characters, which is what you saw.
Therefore, you must (temporarily) set $OutputEncoding
to the encoding that Python expects and/or ask it use UTF-8 instead.