9

How are 4 bytes chars are represented in C#? Like one char or a set of 2 chars?

var someCharacter = 'x'; //put 4 bytes UTF-16 character
hippietrail
  • 15,848
  • 18
  • 99
  • 158
SiberianGuy
  • 24,674
  • 56
  • 152
  • 266
  • Could you give an example of a '4 bytes char'? It would make your question clearer. – jv42 Oct 20 '11 at 09:12
  • @jv42, there are some UTF-16 characters which can not be represented by 2 bytes. So it is any character with code out of 2^16 – SiberianGuy Oct 20 '11 at 09:14
  • 2
    See "Unicode and .NET" article by Jon Skeet - http://csharpindepth.com/Articles/General/Unicode.aspx – sll Oct 20 '11 at 09:16
  • I know those chars exist, providing an example would have made certain there was not a typo in the question, especially as 'char' and 'character' meanings are sometimes confusing. – jv42 Oct 20 '11 at 15:11

1 Answers1

15

C# can only store characters from the Basic Multilingual Plane in the char type. For characters outside this plane two chars must be used - called surrogates.

You can also use a string literal such as:

string s = "\U0001D11E";

See UTF-16.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452