8

When creating an emoji font, is any sequence of ZERO WIDTH JOINER valid?

For instance: can I use ‍★‍ (Waving White Flag + zwj + Black Star + zwj + Green Square) to represent a white flag with a green star on it? And then render it, lets say like the Esperanto flag?

Abel
  • 56,041
  • 24
  • 146
  • 247
Alexander
  • 238
  • 1
  • 9

2 Answers2

6

There are restrictions on what can be part of ZWJ sequences and what cannot. Unicode Technical Standard #51 lays out these rules.

According to definition ED-15a, a well-formed ZWJ sequence can only consist of:

  • Emoji characters (a character with the property Emoji=True)
  • Emoji presentation sequences (an emoji character followed by U+FE0F VARIATION SELECTOR-16, all valid combinations of which are listed in this data file)
  • Emoji modifier sequences (a character with the property Emoji_Modifier_Base=True followed by a character with the property Emoji_Modifier=True)

All relevant properties can be found in this data file.

U+2605 BLACK STAR is not an emoji character (and it is obviously not one of those types of sequences either), so it would not be valid for use in ZWJ sequences as of the time of writing, but you could substitute U+2B50 WHITE MEDIUM STAR (which is an emoji) instead. Other than that, ️ and are fair game.

Side note on U+1F3F3 WAVING WHITE FLAG: This character is an emoji, but it has the property Emoji_Presentation=False, which means it is intended to display as text-style (monochrome rather than colourful) by default. To force emoji-style display, U+FE0F VARIATION SELECTOR-16 has to be appended to it. It is recommended that these variation selectors always be included for characters where Emoji_Presentation=False.

U+2B50 WHITE MEDIUM STAR is also a valid base for such emoji presentation sequences, but it has Emoji_Presentation=True by default and the variation selector is thus entirely optional. U+1F7E9 LARGE GREEN SQUARE meanwhile is not a valid base for emoji presentation sequences and therefore must never be followed by VARIATION SELECTOR-16. I know, it’s convoluted.

What all this means is that you have two choices for the precise sequence of codepoints you want to use, both of which are equally valid. Either:

️‍⭐‍ <U+1F3F3, U+FE0F, U+200D, U+2B50, U+200D, U+1F7E9>

Or:

️‍⭐️‍ <U+1F3F3, U+FE0F, U+200D, U+2B50, U+FE0F, U+200D, U+1F7E9>

CharlotteBuff
  • 3,389
  • 1
  • 16
  • 18
  • this is a very insightful answer. Thank you very much! – Alexander May 01 '20 at 12:03
  • ZWJ predates emojis. Wrt emojis, the above is true. However, it is valid in many other codepoint sequences, so don’t make assumptions that ZWJ without emojis is invalid. The TR above, btw, is specifically about emojis. See, for instance: https://unicode-explorer.com/c/200D – Abel Jan 26 '23 at 14:20
  • @Abel The term “ZWJ sequence” is exclusively used in the context of emoji, so there is no ambiguity in this regard. Obviously any Unicode characters can be used in any sequence and none of these combinations are “invalid” even if they aren’t necessarily meaningful, but when it comes to emoji it’s generally best to follow UTS #51. – CharlotteBuff Jan 26 '23 at 22:03
  • I think you mean "emoji ZWJ sequence"? That document is specific to emojis. In fact, it explains the use of ZWJ in different codepoint sequences as well. Using ZWJ is necessary in Arabic and many Indic scripts. But anyway, this question appears to be specifically about "Emoji ZWJ sequence", and in that context it is all correct. I clarified the title to that effect. – Abel Feb 12 '23 at 17:52
2

I notice your interest in creating the Esperanto flag, but I think font rendering is more complex than just lining up codepoints.

Your brute force approach does not work "as is".

<div>
    &#x1F3F3;&#xFE0F;&#x200D;&#x2605;&#x200D;&#x1F7E9;
</div>

The Unicode standard says in Recommended Emoji ZWJ Sequences, v13.0:

The following are the recommended emoji zwj sequences, which use a U+200D ZERO WIDTH JOINER (ZWJ) to join the characters into a single glyph if available. When not available, the ZWJ characters are ignored and a fallback sequence of separate emoji is displayed. Thus an emoji zwj sequence should only be supported where the fallback sequence would also make sense to a viewer.

I was wondering which part of the font rendering mechanism would be responsible for checking the "availability" (i.e. the rendering engine supporting a certain Unicode version, or the application, or the font), and guessed "the font".

So I came across this article on Emoji fonts, and indeed, font files can contain data on Ligature Substitution, see OpenType for example. Microsoft provides a tool called VOLT which allows the definition of ligatures.

I have no idea about font design, but I would try to create a colored flag glyph with a font editor (sketched here), and define the ligature substitution. (no implied warranty ;) )

devio
  • 36,858
  • 7
  • 80
  • 143
  • I am aware of the fact that existing emoji implementations might support that ligature. My question is, would it be valid if they did? – Alexander May 01 '20 at 09:44
  • But there is an interesting quote in that document you linked to: _Thus an emoji zwj sequence should only be supported where the fallback sequence would also make sense to a viewer._ – Alexander May 01 '20 at 09:46
  • 1
    my understanding is that it is the font definition that defines if it's valid – devio May 01 '20 at 10:44
  • since this is issue is not exactly a programming question, maybe it can be answered on another SE site: https://graphicdesign.stackexchange.com/?tags=fonts – devio May 01 '20 at 10:51