1

I am experimenting with the Escape sequences and can not really use the \U sequence (UTF-32) It does not compile as it can not recognize the sequence for some reason. It recognizes it as UTF-16.

Could you please help me?

Console.WriteLine("\U00HHHHHH");

enter image description here

enter image description here

dbc
  • 104,963
  • 20
  • 228
  • 340
Eric Movsessian
  • 488
  • 1
  • 11
  • 7
    `H` here means "Hex". It represents a placeholder, not a literal "H" char. You need to replace all the "H" characters with hexadecimal digits within the specified range. – 41686d6564 stands w. Palestine Mar 20 '22 at 12:42
  • Console.WriteLine("\U001effff"); this does not work either – Eric Movsessian Mar 20 '22 at 12:55
  • I think I figured it out, It has to be in the format of \U00..... Why do you have to put two zeros after the \U ? This works Console.WriteLine("\U0001F47D"); Also why does the console print a question mark and not the symbol itsef? – Eric Movsessian Mar 20 '22 at 13:03
  • Your doc screenshot shows the range (your attempt from the comment is out of range) plus an example – Hans Kesting Mar 20 '22 at 13:03
  • Do you mean this does not work Console.WriteLine("\U0001F47D")? I mean It compiled and ran but displayed a question mark – Eric Movsessian Mar 20 '22 at 13:08
  • Unicode escape *require* 8 digits after `\U` and it must be a valid 0-10FFFF constant. You're getting a question mark because the font used for display doesn't contain the glyph for the character you are printing or possibly the terminal used only supports the "basic multilingual plane" (BMP) characters 0-FFFF. – Mark Tolonen Mar 20 '22 at 16:57

1 Answers1

4

Your problem is that you copied \U00HHHHHH from the documentation page Strings (C# Programming Guide): String Escape Sequences:

enter image description here

But \U00HHHHHH is not itself a valid UTF-32 escape sequence -- it's a mask where each H indicates where a Hex character must be typed. The reason it's not valid is that hexadecimal numbers consist of the digits 0-9 and the letters A–F or a–f -- and H is not one of these characters. And the literal mentioned in comments, "\U001effff", does not work because it falls outside the range the range of valid UTF-32 characters values specified immediately thereafter in the docs:

(range: 000000 - 10FFFF; example: \U0001F47D = "")*

The c# compiler actually checks to see if the specified UTF-32 character is valid according to these rules:

// These compile because they're valid Hex numbers in the range 000000 - 10FFFF padded to 8 digits with leading zeros:
Console.WriteLine("\U0001F47D");
Console.WriteLine("\U00000000");
Console.WriteLine("\U0010FFFF");
// But these don't.
// H is not a valid Hex character:
// Compilation error (line 16, col 22): Unrecognized escape sequence
Console.WriteLine("\U00HHHHHH");
// This is outside the range of 000000 - 10FFFF:
// Compilation error (line 19, col 22): Unrecognized escape sequence
Console.WriteLine("\U001effff");

See https://dotnetfiddle.net/KezdTG.

As an aside, to properly display Unicode characters in the Windows console, see How to write Unicode characters to the console?.

dbc
  • 104,963
  • 20
  • 228
  • 340