1

Using Swift, is it possible to get the Unicode code points for the glyphs in a True Type Font (TTF) file?

The CGFont doesn't seem to include a reference to the cmap table required to map Unicode code points to glyphs.

The CTGlyphInfo class is close, but it doesn't provide the inverse, that is the Unicode code point for a given glyph.

Crashalot
  • 33,605
  • 61
  • 269
  • 439
  • 1
    That's because cmap encodes how to resolve _many_ different kinds of glyph codes to the unique in-font glyph id. If all you have is the glyph id, then which cmap subtable would you do the revers look up in? subtable format 4? 12? And if those two, _which one_? The windows one? The mac one? So I think before it makes sense to answer this: can you explain what you're actually trying to do that you think requires resolving a font's internal glyph id to its UCS-2 or UCS-4 identifier? – Mike 'Pomax' Kamermans Nov 21 '19 at 04:22
  • @Mike'Pomax'Kamermans Hi Mike! Thanks for the comment. The goal is to render the glyphs in a TTF as an icon font, so the goal is to get the Unicode values for rendering the icons. – Crashalot Nov 21 '19 at 04:31
  • Not sure I follow - if you made the font, you already know which unicode points are defined and which glyphs those map to. Why would you only know the font-internal glyph id? (you have to _get_ that id somehow, and if you got it by asking for the glyph id based on an actual letter you typed, or passed in, you already know the unicode point, because you have the letter. Just look up its unicode value) – Mike 'Pomax' Kamermans Nov 21 '19 at 05:24
  • Sorry, we didn't make the font. It's TTF file someone else made. We only have the TTF file. @Mike'Pomax'Kamermans – Crashalot Nov 21 '19 at 05:29
  • Then you don't want to use Swift for this part of the work: use FontForge or something to inspect the font and get the list of supported glyphs in the charsets you care about, and then rely on that. – Mike 'Pomax' Kamermans Nov 21 '19 at 05:29
  • @Mike'Pomax'Kamermans ok thanks, but just to confirm, FontForge can export the list of Unicode code points? We need the list exported (hence the original desire to use a Swift script). – Crashalot Nov 21 '19 at 05:31
  • Amongst other things. And anything it can't do, you can _make_ it do with a bit of python. Alternatively, You can use the `ttx` command from fonttools (`pip install fonttools`) to export only the cmaps in the font to an XML file. – Mike 'Pomax' Kamermans Nov 21 '19 at 16:29
  • @Mike'Pomax'Kamermans thanks so much for your advice! – Crashalot Nov 21 '19 at 16:42
  • Keep in mind that not all glyphs map to a Unicode code point. The most common reason is ligatures, which may group several characters into a single glyph. Single Unicode code points can also map to several glyphs depending on context. A very common form of that is Arabic isolated/initial/medial/final forms which often look very different, despite having the same codepoint. (see the 'morx' table for more on these substitutions) – Rob Napier Nov 21 '19 at 17:16
  • @RobNapier thanks so much. off-hand, what's an example of a ligature that groups several characters into a single glyph, and do you know why this is done? – Crashalot Nov 21 '19 at 17:23
  • In Latin alphabets, the most famous is the ligature for "f" followed by "i": fi. It exists in many fonts, because otherwise the f's cross bar runs into the i's dot and makes it ugly. In Latin fonts, ff also often has a ligature. The most egregious ligature in a Latin font is Zapfino's ligature for…Zapfino. (It also has numerous other ligatures & variants to get that calligraphy look. It's an outrageous font. I love it.) In Romance languages with accents, ligatures can often fix things up to look better than just slapping an accent over an i, for instance, where it might intersect the dot. – Rob Napier Nov 21 '19 at 18:38
  • In languages like Arabic, some ligatures are mandatory. For example, ل followed by ا should be written as لا (that's two characters, not one; try selecting it). That's not just for looks; it's part of the language. Arabic also has many ligatures built into Unicode, so they're are a single unicode code point. The word(s) ﷸ is a single code point, and so is ﷺ (despite being a whole phrase). And the most famous, of course: ﷽ (that's one "character" in Unicode). These do have a clean unicode->glyph mapping, but I just wanted to point out some of the wonders and complexities of Unicode. – Rob Napier Nov 21 '19 at 18:50
  • 1
    @RobNapier thanks so much for these detailed responses! they are extremely informative. :) fyi the twitter link in your bio redirects to the twitter home page. – Crashalot Nov 21 '19 at 18:55
  • @RobNapier given your passion for fonts and hard problems, you don't happen to know how to convert TTF glyphs into SVG paths do you? will delete once you read this. thanks! – Crashalot Nov 21 '19 at 19:03
  • This answer still works: https://stackoverflow.com/questions/47902878/fontforge-export-a-glyph-to-svg-with-fontforge-command-line – Rob Napier Nov 21 '19 at 19:26

0 Answers0