0

I've got a program that uses a few hash tables to resolve information. I'm getting some weird issues with foreign characters. Below is an accurate representation:

$Props =
@{
    P1  = 'Norte Americano e Inglês'
}

$Expressions =
@{
    E1  = { $Props['P1'] }
}

& $Expressions['E1']

If I paste this into PowerShell 5.1 console or run selection in VSCode I get:

Norte Americano e Inglês

As expected. But if I run the code in VSCose (hit F5). I get:

Norte Americano e Inglês

By debugging, setting a breakpoint right after the hash literal, I can tell the incorrect version is actually in the hash. So this isn't somehow a side effect of the call operator or the use of script blocks.

I attempted to set the output encoding like:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

But this doesn't seem to change the pattern. Frankly, I'm surprised the console is handling Unicode so well in the first place. However, I can't understand the inconsistency. Ultimately this data is written to an AD attribute which again works fine if I execute the steps manually, but gets mangled if I actually run the script, even when the output encoding is set as previously mentioned.

I did look through this Q&A, but I don't seem to be having a console display issue, although that may be a result of the true type fonts. Perhaps they're masking the problem.

Interestingly it does seem to work correctly in VSCode if I switch it to PowerShell 7.1. However, because of integration with the AD cmdlets, which do not function well through implicit session compatibility, it's not possible to use PowerShell Core for this project.

The Dev environment is Windows 2012R2 up-to-date. I'm not sure there's an ability to change the system code page as is mentioned for Win 10 (1909).

Steven
  • 6,817
  • 1
  • 14
  • 14
  • Since the problem occurs with a _string literal_ in your _source code_, the likeliest explanation is that your _script file is misinterpreted by PowerShell_, which happens if the script is saved as UTF-8 _without a BOM_. Try saving your script as UTF-8 _with BOM_; see [this answer](https://stackoverflow.com/a/54790355/45375) for more information. – mklement0 Jun 04 '21 at 18:24

1 Answers1

1

This is pretty ugly but what happens if you try this at the end of your code:

$enc = [System.Text.Encoding]::UTF8
$enc.GetString($enc.GetBytes($(& $Expressions['E1'])))

Also, this might help you Encode a string in UTF-8

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    That's definitely helping. Let me get this into the main code and close this issue out in the morning. This seems to be working too `[System.Text.Encoding]::UTF8.GetString( [Char[]](& $Expressions['E1']) )` but I have to see it through to the output. THANKS! – Steven Mar 19 '21 at 04:21
  • This worked well under the circumstances. Though I would still like an explanation of the observed behavior. At any rate, I ran with your sample adjusted as mentioned and packed it in a function for convenience. Thanks again! – Steven Mar 19 '21 at 17:06
  • Glad it worked Steven. My guess is that by default PS std out seems to be UTF8-noBOM and we're forcing UTF8-BOM here. Though I thought by default PS was always using UTF8-BOM, this is strange behavior for me too :P – Santiago Squarzon Mar 19 '21 at 17:19
  • 1
    [This answer](https://stackoverflow.com/a/54790355/45375) is probably the right solution to the problem: if you save the script file as UTF-8 _with BOM_, Windows PowerShell no longer misinterprets it (PowerShell Core defaults to UTF-8, and therefore reads it correctly even without a BOM). Your solution attempt tries to fix the already-misinterpreted string after the fact, but this isn't a complete solution, because certain Unicode characters can break reading of the script altogether, such as an `€`. – mklement0 Jun 04 '21 at 19:59