1

I am trying to read all glyphs the font has available, in my case 547. Here's what I've done so far:

    private static String getCharacters(Font font) {
        final int glyphs = font.getNumGlyphs();
        System.out.println("Searching for " + glyphs + " glyphs!");
        int[] codePoints = new int[glyphs];
        int found = 0;
        for (int codePoint = Character.MIN_CODE_POINT; codePoint < Character.MAX_CODE_POINT
                && found < glyphs; codePoint++) {
            if (font.canDisplay(codePoint)) {
                codePoints[found++] = codePoint;
            }
        }
        System.out.println("Missing " + (font.getNumGlyphs() - found) + " glyphs!");
        return new String(codePoints, 0, found);
    }

And this is the output:

Searching for 547 glyphs!
Missing 160 glyphs!

Well, the problem is obvious: Where have my 160 glyphs gone?

For anyone trying to reproduce, I am using the Cinzel Regular font.

Thanks in advance for any help!

  • Not all codepoints make marks on paper. For example, "zero width space" and "right to left override". They do not need to be in fonts. – user16632363 Nov 06 '21 at 00:17
  • Many glyphs represent more than one codepoint. For example many emoji are built from multiple codepoints. Also, some ligatures are a single glyph. For example many fonts provide the letter-combination `fi` as a single glyph to improve readability. Also some codepoints may have different representation. For example most arabic letters are rendered differently based on the position within a word (start, middle, end). That would also produce multiple glyphs for the same codepoint. – Joachim Sauer Nov 06 '21 at 00:17
  • FI LIGATURE is a single codepoint: https://www.compart.com/en/unicode/U+FB01 – user16632363 Nov 06 '21 at 00:19
  • @user16632363: yes, that's a bad example because it also exists as a codepoint (but the same glyph might also be used to render a "normal" `f` followed by an `i`). But basically any character combination might get a dedicated ligature in a font, even ones that don't exist in Unicode. `fh` is a better example that might be provided by the font as a ligature but has no corresponding (single) codepoint in Unicode. – Joachim Sauer Nov 06 '21 at 00:21
  • To quote the [Javadoc for `Font`](https://docs.oracle.com/en/java/javase/17/docs/api/java.desktop/java/awt/Font.html): … In general, however, characters and glyphs do not have one-to-one correspondence. … – Basil Bourque Nov 06 '21 at 00:47
  • @user16632363 Zero width space does need to be in the font, if the font wants to support all or most of Unicode. A glyph is not just visual drawing; each glyph has its own metrics, including zero width space. The same is true for all whitespace codepoints. – VGR Nov 06 '21 at 00:47
  • Note that not every codepoint between MIN_CODE_POINT and MAX_CODE_POINT is a valid character. For instance, U+FFFE and U+FFFF are guaranteed by the Unicode specification never to be valid characters. The same is true for U+1FFFE and U+1FFFF, U+2FFFE and U+2FFFF, etc. – VGR Nov 06 '21 at 00:51
  • So what I'm getting is that the java awt font library is extremely limited, as I can't even get all the glyphs it contains? – DasBabyPixel Nov 06 '21 at 08:32
  • To clarify: Is there ANY way without use of third party software to get ligature and kerning information out of the font? – DasBabyPixel Nov 06 '21 at 09:06
  • 1
    @DasBabyPixel: I mean you can always write a TTF parser yourself, if you feel like it. But if you told us what you're trying to achieve, we might be able to point you towards a solution. This smells like an [XY Problem](https://xyproblem.info). – Joachim Sauer Nov 07 '21 at 19:08
  • 1
    When your question is how to get ligature and kerning information out of the font, you should ask that, instead of asking a nonsensical question about “all codepoints” or making bold statements like “the java awt font library is extremely limited”. It has been explained already, there is no one-to-one mapping between codepoints and glyphs. You *can* get the glyphs of a font, all of them, but your code doesn’t ask for them. All your code does, is asking whether a codepoint is displayable. – Holger Feb 10 '22 at 10:09

0 Answers0