2

I am trying to find the official list of emoji code points as defined by the Unicode Standard and am running into some conflicting sources: according to emoji-data.txt even digits are emoji; other sources at unicode.org suggest only a subset of that list to be emoji, for instance the emoji chart and the emoji test file. There is even a section within Unicode Technical Report #51 called Which Characters are Emoji but it does not really answers my question, or at least I can't see it there.

So, which Unicode code points are emoji?

一二三
  • 21,059
  • 11
  • 65
  • 74
Nemanja Trifunovic
  • 24,346
  • 3
  • 50
  • 88

1 Answers1

5

Not all emoji glyphs are created from a single Unicode code point. Some characters only have an emoji appearance when combined into an "emoji sequence". This makes it hard to count emoji characters by code points alone. This is what Section 3 in UTR #51 is trying to convey, but it might be improved with some examples:

  • U+0031 is 1, but takes on an emoji appearance when combined with U+FE0F U+20E3: 1️⃣ (or a plain appearance with U+FE0E U+20E3: 1︎⃣).
  • U+1F170 is , but takes on an emoji appearance when combined with U+FE0F: ️.
  • U+2620 is ☠︎, but takes on an emoji appearance when combined with U+FE0F: ☠️.
  • (In general, the U+FE0F variation sequence was used to turn many existing characters into a corresponding a emoji sequence without having to encode them as a separate code point.)
  • The regional indicator symbols only appear as emoji when they form a country/region code: U+1F1E6 U+1F1F6 , but not U+1F1E6 U+1F1F5 .

The emoji-data.txt file lists all characters that have the Emoji=Yes character property. These are all base characters (e.g. 1, , ☠︎, , , ) that can at least start an emoji sequence, even if they are not a complete sequence by themselves. The emoji-test.txt file lists all complete emoji sequences.

Community
  • 1
  • 1
一二三
  • 21,059
  • 11
  • 65
  • 74