34

Let's assume we have a text that contains a Unicode character that cannot be displayed because our font has no corresponding glyph. Usually, a placeholder is displayed instead, e.g. a rectangular block thingy (see screenshot).

Is there a "glyph not found" character that reliably produces this glyph? I'd like to write something like "If the following text contains <insert character here> then you need another font..." in a UI.

By the way, I am not talking about � (replacement character). This one is displayed when a Unicode character could not be correctly decoded from a data stream. It does not necessarily produce the same glyph:

enter image description here

Sebastian Negraszus
  • 11,915
  • 7
  • 43
  • 70
  • The rectangle **is** the "glyph not found" glyph. Don't help. – Hans Passant Dec 05 '12 at 19:57
  • 1
    While there are many great answers regarding the "glyph not found" glyph, that won't help you actually detect it, as the text string in code will still have the character regardless of the font used to render it. Some rendering libraries I think have the option to query the font but I have no idea how standard this is. – Deanna Oct 12 '20 at 14:06
  • While I don't think there is a Unicode code point for the "missing glyph", in TrueType and OpenType fonts this is guaranteed to be be at glyph ID 0. If you control conversion of unicode characters to glyphs in the font, you could, for example, map a code point in the private use area to glyph ID 0 and then use this. – jochen Sep 09 '21 at 15:24

8 Answers8

22

From the Unicode Spec:

U+25A1 □ WHITE SQUARE

  • may be used to represent a missing ideograph

  • U+20DE $⃞ combining enclosing square

Michaelangel007
  • 2,798
  • 1
  • 25
  • 23
15

No, there is no “glyph not found” character. Different programs use different graphic presentations. An empty narrow rectangle is a common rendering, but not the only one. It could also be a rectangle with a question mark in it or with the code number of the character, in hexadecimal, in it.

So it is better to e.g. display a small image of the character along with the character itself, so that the reader can compare them.

Sebastian Negraszus
  • 11,915
  • 7
  • 43
  • 70
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • 1
    On several Android phones missing glyphs are drawn with just a few pixels of empty space. So it doesn't even have to be something that is visible. – nibarius Nov 14 '15 at 22:18
8

The glyph-not-found character is specified by the font engine and by the font; there is no fixed character for it.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • The question clearly says that it is not about the replacement character, and REPLACEMENT CHARACTER U+FFFD *is* a fixed character (it does not have a fixed glyph, though fonts that contain it tend to use very similar glyphs). – Jukka K. Korpela Dec 05 '12 at 20:14
  • @Jukka: Except I'm not talking about U+FFFD either. – Ignacio Vazquez-Abrams Dec 05 '12 at 20:44
  • 1
    Then don’t use the phrase “replacement character”, because a) it’s not a character at all, and b) it’s specifically not the character with the Unicode name REPLACEMENT CHARACTER, and c) people easily get confused with issues like this. – Jukka K. Korpela Dec 05 '12 at 20:52
4

Unicode uses these terms:

  • replacement glyph
  • missing glyph
  • interpretable but unrenderable character

The Unicode Standard (10.0) does not define how they have to look, but it suggests in chapter 5.3 [PDF] that implementations display

[…] distinctive glyphs that give some general indication of their type […]

to distinguish them from "unassigned code points". They give some examples:

The Unicode glossary entry says:

It often is shown as an open or black rectangle.


tl;dr: There is no standardized look/glyph, it’s up to the implementation. To help users, implementations could display glyphs that indicate what type of character it is that can’t be displayed.

unor
  • 92,415
  • 26
  • 211
  • 360
3

Use a non-character like U+10FFFF (at the very end of the Unicode space) which is 99.99% certain to not be found in the cmap table of any sane font. At least no known Windows system font maps that non-character to a glyph, and highly unlikely any Linux/Mac system font either. Even the all encompassing Last Resort font (http://www.unicode.org/policies/lastresortfont_eula.html) doesn't appear to map it. So while there is no official "glyph not found" character defined in Unicode that will map to the .notdef glyph, the above non-character is in practice guaranteed to display that glyph, whatever the glyph design is in that particular font. The .notdef glyph (glyph id 0 in OpenType) may be a simple hollow rectangle (standard), box with x, box with question mark, blank occasionally (which is bad practice), and sometimes bizarre things like spirals (in Palatino Linotype).

Dwayne Robinson
  • 2,034
  • 1
  • 24
  • 39
2

Also, (from what I've heard) Japanese uses the GETA MARK 〓 U+3013

CJK Symbols and Punctuation

martin
  • 1,102
  • 2
  • 12
  • 20
2

There is a notdef character that means the glyph is not found. But it has no charcode. You can use the charcodes of controll characters to insert a notdef character (like "", U+0002)

Migats21
  • 176
  • 11
1

There are 3 possible characters for glyph not found.

Check in Microsoft specification, topic Shape of .notdef glyph, https://learn.microsoft.com/en-us/typography/opentype/otspec170/recom#shape-of-notdef-glyph

Ted Mielczarek
  • 3,919
  • 26
  • 32
Lahiru
  • 2,609
  • 3
  • 18
  • 29