It looks like character encoding of the original text was misinterpreted when the text was converted to .NET strings.
Specifically, it looks like UTF-8-encoded text was misinterpreted as "ANSI"-encoded or, in the context of cmdlets such as Invoke-WebRequest
, as a similar fixed-width single-byte encoding such as ISO-8859-1, so that each byte in the UTF-8 input became a character in its own right, even though UTF-8 encodes non-ASCII-range characters as multiple bytes.
To correct this problem, you must re-encode the string:
convert the misinterpreted string back to bytes using the input string's mistakenly applied encoding, so as to get the original byte representation.
then reconvert these bytes back to a string using the true encoding, namely UTF-8.
# Note: Works in Windows PowerShell only - in PowerShell Core,
# [Text.Encoding]::Default is *invariably* UTF-8.
$originalBytes = [Text.Encoding]::Default.GetBytes('é')
[Text.Encoding]::Utf8.GetString($originalBytes)
The above yields é
.
In Windows PowerShell, [Text.Encoding]::Default
is your system's "ANSI" encoding; for ISO-8859-1 encoding, use [Text.Encoding]::GetEncoding(28591)
Note that the entire problem would not have arisen in PowerShell Core, which consistently defaults to (BOM-less) UTF-8.
Should you find yourself in need of using the "ANSI" encoding even in PowerShell Core, see this answer.