Questions tagged [codepoint]

A CodePoint is a numeric value that make up the unicode codespace.

CodePoint may represents a character or also have other meanings (seven fundamental classes of code points in the standard: Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved).

Related links

Related tags

116 questions
124
votes
3 answers

What are the most common non-BMP Unicode characters in actual use?

In your experience which Unicode characters, codepoints, ranges outside the BMP (Basic Multilingual Plane) are the most common so far? These are the ones which require 4 bytes in UTF-8 or surrogates in UTF-16. I would've expected the answer to be…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
92
votes
5 answers

Get unicode code point of a character using Python

In Python API, is there a way to extract the unicode code point of a single character? Edit: In case it matters, I'm using Python 2.7.
Ken
  • 30,811
  • 34
  • 116
  • 155
89
votes
4 answers

Why is 'U+' used to designate a Unicode code point?

Why do Unicode code points appear as U+? For example, U+2202 represents the character ∂. Why not U- (dash or hyphen character) or anything else?
Senthil Kumaran
  • 54,681
  • 14
  • 94
  • 131
78
votes
2 answers

Why does the red heart emoji require two code points, but the other colored hearts require one?

It appears that the red heart emoji (❤️) "\u2764\uFE0F" requires two Unicode codepoints, specifically Heavy Black Heart followed by a Variation Selector. However, blue , green , yellow , and purple each have their own single codepoint. Why is red…
Newtang
  • 6,414
  • 10
  • 49
  • 70
52
votes
3 answers

Difference between codePointAt and charCodeAt

What is the difference between String.prototype.codePointAt() and String.prototype.charCodeAt() in JavaScript? 'A'.codePointAt(); // 65 'A'.charCodeAt(); // 65
Stanislav Mayorov
  • 4,298
  • 5
  • 21
  • 44
35
votes
4 answers

What exactly does String.codePointAt do?

Recently I ran into codePointAt method of String in Java. I found also a few other codePoint methods: codePointBefore, codePointCount etc. They definitely have something to do with Unicode but I do not understand it. Now I wonder when and how one…
Michael
  • 41,026
  • 70
  • 193
  • 341
32
votes
4 answers

Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters")

Splitting a JavaScript string into "characters" can be done trivially but there are problems if you care about Unicode (and you should care about Unicode). JavaScript natively treats characters as 16-bit entities (UCS-2 or UTF-16) but this does not…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
27
votes
3 answers

Does Unicode have a defined maximum number of code points?

I have read many articles in order to know what is the maximum number of the Unicode code points, but I did not find a final answer. I understood that the Unicode code points were minimized to make all of the UTF-8 UTF-16 and UTF-32 encodings able…
user4344762
23
votes
4 answers

How to output unicode string to RTF (using C#)

I'm trying to output unicode string into RTF format. (using c# and winforms) From wikipedia: If a Unicode escape is required, the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode codepoint number. For the…
Emir
  • 1,586
  • 3
  • 16
  • 32
19
votes
2 answers

What is exactly an overlong form/encoding?

Reading the Wikipedia article on UTF-8, I've been wondering about the term overlong. This term is used various times but the article doesn't provide a definition or reference for its meaning. I would like to know if someone can explain the term and…
nEAnnam
  • 1,246
  • 2
  • 16
  • 22
19
votes
1 answer

Why is Unicode restricted to 0x10FFFF?

Why is the maximum Unicode code point restricted to 0x10FFFF? Is it possible to represent Unicode above this code point - for e.g. 0x10FFFF + 0x000001 = 0x110000 - through any encoding schemes like UTF-16, UTF-8?
dinesh ranawat
  • 223
  • 2
  • 8
9
votes
2 answers

What are the consequences of storing a C# string (UTF-16) in a SQL Server nvarchar (UCS-2) column?

It seems that SQL Server uses Unicode UCS-2, a 2-byte fixed-length character encoding, for nchar/nvarchar fields. Meanwhile, C# uses Unicode UTF-16 encoding for its strings (note: Some people don't consider UCS-2 to be Unicode, but it encodes all…
Triynko
  • 18,766
  • 21
  • 107
  • 173
9
votes
2 answers

Finding Unicode character name with Javascript

I need to find out the names for Unicode characters when the user enters the number for it. An example would be to enter 0041 and get given "Latin Capital Letter A" as the result.
TomC
  • 251
  • 1
  • 3
  • 4
9
votes
1 answer

How to cast a QChar to int

In C++ there is a way to cast a char to int and get the ascii value in return. Is there such a way to do the same with a qchar? Since unicode supports so many characters and some of them are actually looking alike, it is sometimes hard to tell what…
alexander remus
  • 409
  • 1
  • 4
  • 11
8
votes
2 answers

Are all Unicode Emoji ZWJ Sequences valid?

When creating an emoji font, is any sequence of ZERO WIDTH JOINER valid? For instance: can I use ‍★‍ (Waving White Flag + zwj + Black Star + zwj + Green Square) to represent a white flag with a green star on it? And then render it, lets say like…
Alexander
  • 238
  • 1
  • 9
1
2 3 4 5 6 7 8