3

I am parsing some XML text though an API without saving the actual file and have run into an issue when the text includes characters from other languages.

When trying to convert 'ë' or others like this, I end up with the text é instead. Is there a way to do change encoding of a variable within memory as I am not using any files.

Any help would be greatly appreciated.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 3
    You need to show us, at a minimum, the code you are using to change the text, preferably how you are getting the data in the first place as you might have better control there. – Matt Oct 30 '18 at 19:47
  • I was thinking $results = Invoke-RestMethod -Uri $uri -Method Post -Body $body would not have provided much information. From there I am doing a foreach on the results and adding it to an array `code $newArrayItemC | Add-Member -type NoteProperty -Name 'mbid' -Value ($conAtt.'#text') } ` The last part `code$conAtt.'#text'` is what is auto converting for me and I need to find a way to force the encoding on a variable as I have realized I will not be able to just go after the raw data since powershell is doing some conversion for me. – ToxicApotheke Oct 30 '18 at 19:55
  • 1
    What I was trying to draw more attention to was: _the code you are using to __change the text__,_ Yes, that code you show does not help but could we see the `$uri` and `$body`? – Matt Oct 30 '18 at 19:56
  • 1
    As I expanded upon above there is nothing that I am doing to change the text. At the root of the issue I am trying to find a way to force a type of encoding on a variable without using files. – ToxicApotheke Oct 30 '18 at 20:02
  • Are you trying to adjust like `-ContentType 'application/json; charset=utf-8'`? Also _When trying to convert_ is confusing when you say _there is nothing that I am doing to change the text._. I did not see your expanded comment at the time. So it looks fine when you print it to file? or you mean that is the character you are supposed to see? – Matt Oct 30 '18 at 20:07

1 Answers1

6

It looks like character encoding of the original text was misinterpreted when the text was converted to .NET strings.

Specifically, it looks like UTF-8-encoded text was misinterpreted as "ANSI"-encoded or, in the context of cmdlets such as Invoke-WebRequest, as a similar fixed-width single-byte encoding such as ISO-8859-1, so that each byte in the UTF-8 input became a character in its own right, even though UTF-8 encodes non-ASCII-range characters as multiple bytes.

To correct this problem, you must re-encode the string:

  • convert the misinterpreted string back to bytes using the input string's mistakenly applied encoding, so as to get the original byte representation.

  • then reconvert these bytes back to a string using the true encoding, namely UTF-8.

# Note: Works in Windows PowerShell only - in PowerShell Core,
# [Text.Encoding]::Default is *invariably* UTF-8.
$originalBytes = [Text.Encoding]::Default.GetBytes('é')
[Text.Encoding]::Utf8.GetString($originalBytes)

The above yields é.

In Windows PowerShell, [Text.Encoding]::Default is your system's "ANSI" encoding; for ISO-8859-1 encoding, use [Text.Encoding]::GetEncoding(28591)

Note that the entire problem would not have arisen in PowerShell Core, which consistently defaults to (BOM-less) UTF-8.
Should you find yourself in need of using the "ANSI" encoding even in PowerShell Core, see this answer.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thanks! Just another reason I hate PowerShell. In your opinion, is PowerShell Core "new and improved" or just "new"? I have a running collection of things I hate about PowerShell, including the GitHub PowerShellGet issues you've seen me log. – John Zabroski Mar 25 '19 at 23:02
  • 1
    @JohnZabroski, to me the short of it is that PowerShell Core is _partly_ improved, but still too beholden to backward compatibility; the primary GitHub issue that collects existing grievances and calls for a potential future version [policy] that allows fundamental breaking changes in order to shed historical baggage is https://github.com/PowerShell/PowerShell/issues/6745 – mklement0 Mar 26 '19 at 02:12