2

I want to support Unicode and as most characters as possible in my PowerShell script. As encoding I want to use UTF-8. So for testing purposes I simply type this line and press enter:

[char]0x02A7

And it successfully shows the character ʧ.

But when I try to display a Unicode character (> 0xFFFF):

[char]0x01F600  

It throws an error telling that the value 128512 cannot be converted to System.Char. Instead it should show the smiley .

What is wrong here?

Edit:

As Jeroen Mostert stated in the comments, I have to use another command for unicode characters with code point > 0xFFFF. So I wrote this script:

$s = [Char]::ConvertFromUtf32(0x01F600)
Write-Host $s

In the PowerShell IDE I get a beautiful smiley . But when I run the script standalone (in an own window) I don't get the smiley. Instead it shows two strange characters.

What is wrong here?

zomega
  • 1,538
  • 8
  • 26
  • In .NET, which Powershell is based on, characters are 16-bit. You will have to figure out to encode that symbol as two characters. – Lasse V. Karlsen Nov 19 '21 at 08:56
  • 5
    `char` is a 16-bit type and only holds 16-bit UTF-16 code units, not the full range of Unicode characters. Characters with code points outside this range have to be represented as a full `String` (`[Char]::ConvertFromUtf32(0x01F600)`); this string will be made up of two surrogate characters. Note that there is no such thing as a "3-byte Unicode character", and you have to be careful with terminology here lest you confuse yourself. Unicode characters have (numeric) code points, which are represented in different ways, with the number of bytes required depending on the encoding used. – Jeroen Mostert Nov 19 '21 at 08:57
  • @JeroenMostert thank you for this knowledge. I now get a beautiful smiley in the IDE. But if I run the script in a PowerShell terminal window (Win+X) it shows two strange characters. Do you know why? (Also see my edit) – zomega Nov 19 '21 at 15:23
  • @somega The answer is likely that the fonts that ship with the Console Host (the default terminal host in Windows) don't support smileys and other wide-chars :) – Mathias R. Jessen Nov 19 '21 at 15:25
  • Encoding issues are a whole differerent kettle of fish. See [this answer](https://stackoverflow.com/a/49481797/4137916) for a ton of gory details. You should see only *one* character, but that will probably still be a replacement character, because your console won't support emojis. You can verify that by trying to copy-paste the smiley directly to the prompt: it'll show up as a replacement character there too. Emoji support requires something like [Windows Terminal](https://learn.microsoft.com/windows/terminal/install); launching PS from there gives you emoji support by default. – Jeroen Mostert Nov 19 '21 at 15:34
  • Emoji's are 2 characters long. You'd have to do some surrogate math to make them yourself. https://stackoverflow.com/a/62391840/6654942 – js2010 Nov 19 '21 at 16:12

1 Answers1

1

Aside from [Char]::ConvertFromUtf32(), here's a way to calculate the surrogate pair by hand for code points over 2 bytes or 16 bits long (http://www.russellcottrell.com/greek/utilities/surrogatepaircalculator.htm):

$S = 0x1F600
[int]$H = [Math]::Truncate(($S - 0x10000) / 0x400) + 0xD800
[int]$L = ($S - 0x10000) % 0x400 + 0xDC00
[char]$H + [char]$L


js2010
  • 23,033
  • 6
  • 64
  • 66
  • 1
    This is the same as UnicodeHexHTML emoji decoding. Thank you. It was too lazy to raise archives about high, low bytes and all sorts of other rubbish. – Garric Mar 25 '23 at 04:21