2

Is there a way to use supplemental Unicode characters (for example '') as char literals in C#? I tried it in VS 2017, with the source file saved as UTF-8 with BOM, UTF-16 LE and BE and I always get the error Too many characters in character literal.

Victor Grigoriu
  • 417
  • 5
  • 18
  • The `char` type is effectively a single utf-16 code point. If the character is not a single utf-16 code point then no. – Mike Zboray Feb 18 '17 at 19:20
  • 1
    FWIW it is possible to represent it as a string, `"\uD83C\uDCDC"`. – Mike Zboray Feb 18 '17 at 19:24
  • @mikez Note: It's not necessary to use the \u notation. – Tom Blodget Feb 18 '17 at 19:41
  • 1
    @mikez: Or just use `\U` instead: `"\U0001F0DC"` – Jon Skeet Feb 18 '17 at 19:41
  • 1
    @TomBlodget: It's not *necessary*, but it does mean you don't need to worry about encodings as much, if all your source code is ASCII. – Jon Skeet Feb 18 '17 at 19:42
  • 1
    @JonSkeet I wouldn't do that with source code files because that could lead to making the same assumption about other text files—and that just won't do. – Tom Blodget Feb 18 '17 at 19:51
  • Out of curiosity, is there any language that treats codepoints as first class concepts? – Victor Grigoriu Feb 18 '17 at 19:52
  • @TomBlodget: The difference is that I control how I treat other source files in my code, whereas it can (depending on platform etc) be slightly trickier - or at least annoying - to persuade all tools everywhere to handle source code as UTF-8. – Jon Skeet Feb 18 '17 at 22:14
  • @JonSkeet Yes, we do put up with cases of "the cobbler's children have no shoes" in our work. I've been lucky enough to be dogmatic with character encodings. – Tom Blodget Feb 19 '17 at 16:40

1 Answers1

3

No, char is one UTF-16 code unit. String is a sequence of UTF-16 code units so if you have a codepoint that UTF-16 encodes as two code units, use a String literal.

""
Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
  • Right, I was just reading about how to get the codepoints out from a string: https://stackoverflow.com/questions/687359/how-would-you-get-an-array-of-unicode-code-points-from-a-net-string – Victor Grigoriu Feb 18 '17 at 19:50