3

For a REST call I need the German "Stück" in UTF-8 as read from an access database with

$conn = New-Object System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=$filename;Persist Security Info=False;")

and try to convert it. I have found out that PowerShell ISE seems to encode string constants in ANSI. So I tried as a minimum test without database and got the same result:

$Text1 = "Stück" # entered via ISE, this is also what I get from the database
# ($StringFromDatabase -eq $Test1) shows $true

$enc = [System.Text.Encoding]::GetEncoding(1252).GetBytes($Text1)
# also tried [System.Text.Encoding]::GetEncoding("ISO-8859-1") # = 28591

$Text1 = [System.Text.Encoding]::UTF8.GetString($enc)

$Text1
$Text1 = "Stück" # = UTF-8, entered here with Notepad++, encoding set to UTF-8
"must see: $Text1"

So I get two outputs - the converted one (showing "St?ck") but I need to see "Stück".

OliverLx
  • 33
  • 1
  • 4
  • You've got it backwards - sounds like you want `UTF.GetBytes($Text1)` and then back to `ASCII.GetString($enc)` – Mathias R. Jessen Sep 07 '21 at 14:32
  • As an aside: The PowerShell ISE is [no longer actively developed](https://docs.microsoft.com/en-us/powershell/scripting/components/ise/introducing-the-windows-powershell-ise#support) and [there are reasons not to use it](https://stackoverflow.com/a/57134096/45375) (bottom section), notably not being able to run PowerShell (Core) 6+. The actively developed, cross-platform editor that offers the best PowerShell development experience is [Visual Studio Code](https://code.visualstudio.com/) with its [PowerShell extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode.PowerShell). – mklement0 Sep 07 '21 at 15:45
  • As an aside: Please avoid pseudo method syntax: instead of `New-Object SomeType(arg1, ...)`, use `New-Object SomeType [-ArgumentList] arg1, ...` - PowerShell cmdlets, scripts and functions are invoked like _shell commands_, not like _methods_. That is, no parentheses around the argument list, and _whitespace_-separated arguments (`,` constructs an _array_ as a _single argument_, as needed for `-ArgumentList`). See [this answer](https://stackoverflow.com/a/50636061/45375) – mklement0 Sep 07 '21 at 15:46
  • @mklement0 Thank you for the hint with ISE problems. Locally I am using Visual Studio 2019 and VS Code but edit/run the currently mentioned script on a customer system within ISE in TeamViewer co-working sessions without any problems. But good to know! – OliverLx Sep 12 '21 at 07:46

1 Answers1

2

that PowerShell ISE seems to encode string constants in ANSI.

That only applies when communicating with external programs, whereas you're using in-process .NET APIs.

As an aside: this discrepancy with regular console windows, which use the active OEM code page is one of the reasons that make the obsolescent ISE problematic - see the bottom section of this answer for more information.

String literals in memory are always .NET strings, which are UTF-16-encoded (composed of 16-bit Unicode code units), capable of representing all Unicode characters.[1]


Character encoding in web-service calls (Invoke-RestMethod, Invoke-WebRequest):

To send UTF-8 strings, specify charset=utf-8 as part of the -ContentType argument; e.g.:

Invoke-RestMethod -ContentType 'text/plain; charset=utf-8' ...

On receiving strings, PowerShell automatically decodes them either based on an explicitly specified charset field (character encoding) in the response's content header or, in its absence using ISO-8859-1 (which is closely related to, but in effect a subset of Windows-1252).

  • If a given response doesn't specify a charset but in actually uses a different encoding from ISO-8859-1 - say UTF-8 - PowerShell will misinterpret the strings received, which requires re-encoding after the fact - see this answer.

Character encoding when communicating with external programs:

If you need to send a string with a particular encoding to an external program (via the pipeline, which the target program receives via stdin), set the $OutputEncoding preference variable to that encoding, and PowerShell will automatically convert your .NET strings to the specified encoding.

To send UTF-8-encoded strings to external programs via the pipeline:

$OutputEncoding = [System.Text.UTF8Encoding]::new()

Note, however, that this alone isn't sufficient in order to correctly receive UTF-8 output from external programs; for that, you need to set [Console]::OutputEncoding to the same encoding.

To make your PowerShell session fully UTF-8-aware (irrespective of whether in the ISE or a regular console window):

# Needed in the ISE only:
chcp >$null # Dummy console-program call that ensures that a console is allocated.

# Set all encodings relevant to communicating with external programs to UTF-8.
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding =
  [System.Text.UTF8Encoding]::new()

See this answer for more information.


[1] Note, however, that Unicode characters with a code point greater than 0xFFFF, i.e. those outside the so-called BMP (Basic Multilingual Plane), must be represented with two 16-bit code units ([char]), namely so-called surrogate pairs.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Thank you very much. The trick was to add the content type in both the header and as additional parameter in **Invoke-RestMethod**. I already used in my `function CheckAccessToken([ref]$expires, [ref]$headers): ... $headers.Value = @{ ... "Content-type" = "application/json; charset=UTF-8" Accept = "application/json" }` and called `$result = Invoke-RestMethod -Uri $productUrl -Method Patch -Headers $headers -Body $body` but had to add `-ContentType "application/json; charset=UTF-8"` there. Thank you so much again :-). – OliverLx Sep 12 '21 at 07:39