tl;dr
This normally indicates the server is encoding the content into a response byte stream in one format (e.g. utf8
) but the client is decoding the byte stream using a different format (e.g. iso-8859-1
). As a result, the content decoded by the client doesn't match the original content encoded by the server.
This snippet shows the effect in action:
$originalContent = "Mężczyzna";
# encode with utf8
$encodedBytes = [System.Text.Encoding]::UTF8.GetBytes($originalContent);
# decode with iso-8859-1
$decodedContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetString($encodedBytes)
$decodedContent
# MÄżczyzna
Unfortunately it's not 100% guaranteed to be able to reverse the process - mis-decoding some inputs is lossy so you can't always recover the original content by reversing the decoding and encoding steps, but if you write the response to disk PowerShell will just stream the raw response bytes into a file, and you can read it back using the server's encoding format to recover the original content:
$filename = "c:\temp\response.txt";
$response = Invoke-RestMethod `
-Uri "https://<URL>/api/MethodInvoker/InvokeServiceMethod" `
-Method "POST" `
-Headers $headers `
-Body $body `
-OutFile $filename;
# ^^^^^^^^ ^^^^^^^^^
# write the raw byte stream to disk without (mis-)decoding it
$text = Get-Content $filename
More Details
The root problem seems to be caused by different interpretations of what the default encoding should be for some content types - for example:
Content-Type: application/json
Some systems (including Windows PowerShell) appear to use an older heuristic that assumes content is encoded using iso-8859-1
unless a charset
optional parameter is specified on the content type - see RFC2616: Hypertext Transfer Protocol -- HTTP/1.1
When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
For example if Windows PowerShell receives a response with this header:
Content-Type: application/json
it will treat it like:
Content-Type: application/json;charset=iso-8859-1
whereas if the response contains this header:
Content-Type: application/json;charset=utf-8
Windows PowerShell will use utf8
to decode it instead.
This interpretation was superseded in RFC7321: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content where it says:
The default charset of ISO-8859-1 for text media types has been removed; the default is now whatever the media type definition says.
and since the spec for RFC8259: The JavaScript Object Notation (JSON) Data Interchange Format says:
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].
that's what some clients do, so for those systems this:
Content-Type: application/json
is treated like
Content-Type: application/json;charset=utf-8
and they use utf8
even if no charset
is specified.
You could fix the original issue by getting the owner of the website / api to add the charset=utf-8
optional parameter onto the content-type
header which would improve interoperability with some clients, but it's not strictly necessary according to the various specs, and may not be straightforward to get applied if the site is owned by a third party.
And based on the above, the reason the Content-Type: application/json
response header works in Postman is probably that it uses the newer interpretation of the specs and assumes utf8
encoding for application/json
, whereas Windows PowerShell is using the older interpretation of iso-8859-1
encoding.
For reference, this GitHub issue was the key to understanding all of this behaviour.
Finally...
...if you want a script to help debug these sorts of issues in future I wrote one a while ago in this answer - https://stackoverflow.com/a/67182420/3156906. It takes the original text and the mangled text and tries to work out what pair of mismatched encoding / decoding were used mangle the text. When I ran it with your text it gave me this:
original string = 'Mężczyzna'
mangled string = 'MÄżczyzna'
source encoding = 'utf-8'
target encoding = 'iso-8859-1'
original string = 'Mężczyzna'
mangled string = 'MÄżczyzna'
source encoding = 'utf-8'
target encoding = 'iso-8859-13'
original string = 'Mężczyzna'
mangled string = 'MÄżczyzna'
source encoding = 'utf-8'
target encoding = 'iso-8859-9'