11

From a scripting language (Python or Ruby, say) on a Debian-based system, I would like to find either one of:

  1. All the Unicode codepoints that a particular font has glyphs for
  2. All the fonts that have glyphs for a particular Unicode codepoint

(Obviously either 1 or 2 can be derived form the other, so whatever is easier would be great.) I have done this in the past by running:

fc-list : file charset

... and parsing the output at the end of each line, based on this code from fontconfig but it seems to me that there ought to be a much simpler way of doing this.

(I'm not completely sure this is the right StackExchange site for this question, but I am looking for an answer that can be used programmatically.)

Mark Longair
  • 446,582
  • 72
  • 411
  • 327
  • "There ought to be a simpler way"? Do you know how many font formats there are? And you want to be able to processes *all* of them?! – Kerrek SB Apr 09 '13 at 08:06
  • @Kerrek SB: I know (of course!) that there are many different font formats, but we have libraries that deal with that - for example, the fontconfig command I gave in the question does give you the information I'm after for fonts of several different formats. – Mark Longair Apr 09 '13 at 08:12
  • 2
    Related: http://stackoverflow.com/questions/4458696/finding-out-what-characters-a-font-supports – leonbloy Apr 09 '13 at 14:26
  • 1
    This python script works great : http://unix.stackexchange.com/a/268286/26952 – Skippy le Grand Gourou Feb 07 '17 at 13:26

2 Answers2

7

I would try any of the FreeType 2 language bindings. Here's a Perl solution to list the Unicode code points of a font using Font::FreeType:

use Font::FreeType;
Font::FreeType->new->face('DejaVuSans.ttf')->foreach_char(sub {
    printf("%04X\n", $_->char_code);
});
nwellnhof
  • 32,319
  • 7
  • 89
  • 113
  • +1 Thanks, that's very helpful - I'll wait a little before ticking "accept" in case there are other answers. – Mark Longair Apr 09 '13 at 15:34
  • 1
    Any idea why this doesn’t seem to notice glyphs that are allocated to private use areas, like alternate swashes? – tchrist Jan 14 '15 at 16:17
  • No, but it's certainly not a problem rooted in the Perl bindings. From a quick glance at the freetype2 source code, maybe [`find_unicode_charmap`](http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/base/ftobjs.c#n973) chooses the wrong charmap? – nwellnhof Jan 15 '15 at 00:02
  • To install the module on Debian/Ubuntu systems: `sudo apt install libfont-freetype-perl`. – mivk Sep 20 '22 at 14:42
4

I've recently listed the mapping from unicode codepoints to glypths in a TTF using TTX/FontTools. That tool is written in Python, so it matches the Python tag in your post. The command

ttx -t cmap foo.ttf

will generate an XML file foo.ttx which describes that mapping, for various environments and encodings. See e.g. this reference for a description of what the platform and encoding identifiers actually mean. I assume that the package can be used as a library as well as a command line tool, but I have no experience there.

MvG
  • 57,380
  • 22
  • 148
  • 276