Not all emoji glyphs are created from a single Unicode code point. Some characters only have an emoji appearance when combined into an "emoji sequence". This makes it hard to count emoji characters by code points alone. This is what Section 3 in UTR #51 is trying to convey, but it might be improved with some examples:
U+0031
is 1, but takes on an emoji appearance when combined with U+FE0F U+20E3
: 1️⃣ (or a plain appearance with U+FE0E U+20E3
: 1︎⃣).
U+1F170
is , but takes on an emoji appearance when combined with U+FE0F
: ️.
U+2620
is ☠︎, but takes on an emoji appearance when combined with U+FE0F
: ☠️.
- (In general, the
U+FE0F
variation sequence was used to turn many existing characters into a corresponding a emoji sequence without having to encode them as a separate code point.)
- The regional indicator symbols only appear as emoji when they form a country/region code:
U+1F1E6 U+1F1F6
, but not U+1F1E6 U+1F1F5
.
The emoji-data.txt
file lists all characters that have the Emoji=Yes
character property. These are all base characters (e.g. 1, , ☠︎, , , ) that can at least start an emoji sequence, even if they are not a complete sequence by themselves. The emoji-test.txt
file lists all complete emoji sequences.