Unicode characters beyond the 16-bit Basic Multilingual plane. Those which require surrogate pairs in languages with UTF-16 as their native text encoding.
Questions tagged [astral-plane]
43 questions
124
votes
3 answers
What are the most common non-BMP Unicode characters in actual use?
In your experience which Unicode characters, codepoints, ranges outside the BMP (Basic Multilingual Plane) are the most common so far? These are the ones which require 4 bytes in UTF-8 or surrogates in UTF-16.
I would've expected the answer to be…

hippietrail
- 15,848
- 18
- 99
- 158
41
votes
6 answers
JavaScript strings outside of the BMP
BMP being Basic Multilingual Plane
According to JavaScript: the Good Parts:
JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide.
This leads me to believe that JavaScript uses…

Delan Azabani
- 79,602
- 28
- 170
- 210
27
votes
4 answers
What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?
Updated question ¹
With regards to character classes, comparison, sorting, normalization and collations, what Unicode version or versions are supported by which .NET platforms?
Original question
I remember somewhat vaguely having read that .NET…

Abel
- 56,041
- 24
- 146
- 247
21
votes
6 answers
How would you get an array of Unicode code points from a .NET String?
I have a list of character range restrictions that I need to check a string against, but the char type in .NET is UTF-16 and therefore some characters become wacky (surrogate) pairs instead. Thus when enumerating all the char's in a string, I don't…

Neil C. Obremski
- 18,696
- 24
- 83
- 112
19
votes
2 answers
Java regex match characters outside Basic Multilingual Plane
How can I match characters (with the intention of removing them) from outside the unicode Basic Multilingual Plane in java?

ʞɔıu
- 47,148
- 35
- 106
- 149
17
votes
4 answers
char to Unicode more than U+FFFF in java?
How can I display a Unicode Character above U+FFFF using char in Java?
I need something like this (if it were valid):
char u = '\u+10FFFF';

liuyuqing
- 171
- 1
- 3
17
votes
4 answers
Java charAt used with characters that have two code units
From Core Java, vol. 1, 9th ed., p. 69:
The character ℤ requires two code units in the UTF-16 encoding. Calling
String sentence = "ℤ is the set of integers"; // for clarity; not in book
char ch = sentence.charAt(1)
doesn't return a space but the…

Patrick Brinich-Langlois
- 1,381
- 1
- 15
- 29
14
votes
4 answers
Unicode characters from charcode in javascript for charcodes > 0xFFFF
I need to get a string / char from a unicode charcode and finally put it into a DOM TextNode to add into an HTML page using client side JavaScript.
Currently, I am doing:
String.fromCharCode(parseInt(charcode, 16));
where charcode is a hex string…

leemes
- 44,967
- 21
- 135
- 183
9
votes
4 answers
In Windows, how do you enter a character outside of the Unicode Basic Multilingual Plane?
I know that Windows has supported supplemental planes since Windows XP.
I have fonts which I know have characters outside the basic multilingual plane (BMP).
For these characters, the Unicode codepoint consists of five hexadecimal digits.
I do not…

yam655
- 206
- 2
- 8
9
votes
2 answers
How to enter non-BMP unicode (hexadecimal with more than 4 characters) as input to Mathematica
Problem description:
Mathematica use
"\:nnnn"
as the syntax for unicode input. E.g.,
if we enter
"\:6c34", we get "水" ("water" in Chinese).
But what if one wants to enter "\:1f618" (face throwing a kiss).
When I tried this, I got "ὡ8", not "a…

Ning
- 2,850
- 2
- 16
- 23
9
votes
1 answer
How are 4 bytes characters represented in C#
How are 4 bytes chars are represented in C#? Like one char or a set of 2 chars?
var someCharacter = 'x'; //put 4 bytes UTF-16 character

SiberianGuy
- 24,674
- 56
- 152
- 266
9
votes
2 answers
how to render 32bit unicode characters in google v8 (and nodejs)
does anyone have an idea how to render unicode 'astral plane' characters (whose CIDs are beyond 0xffff) in google v8, the javascript vm that drives both google chrome and nodejs?
funnily enough, when i give google chrome (it identifies as…

flow
- 3,624
- 36
- 48
8
votes
2 answers
C# Regular Expressions with \Uxxxxxxxx characters in the pattern
Regex.IsMatch( "foo", "[\U00010000-\U0010FFFF]" )
Throws: System.ArgumentException: parsing "[-]" - [x-y] range in reverse order.
Looking at the hex values for \U00010000 and \U0010FFF I get: 0xd800 0xdc00 for the first character and 0xdbff…

Ben McNiel
- 8,661
- 10
- 36
- 38
8
votes
2 answers
Unicode Supplementary Multilingual Plane in Java
I want to work with SMP(Supplementary Multilingual Plane) in Java. Actually, I want to print a character whose codepoint is more than 0xFFFF. I used this line of code:
int hexCodePoint = Character.toCodePoint('\uD801', '\uDC02' );
to have the…

Shadi
- 93
- 1
- 6
8
votes
1 answer
Can MongoDB store and manipulate strings of UTF-8 with code points outside the basic multilingual plane?
In MongoDB 2.0.6, when attempting to store documents or query documents that contain string fields, where the value of a string include characters outside the BMP, I get a raft of errors like: "Not proper UTF-16: 55357", or "buffer too small"
What…

Eli
- 227
- 1
- 3
- 11