12

In Python when I render a unicode character, e.g. a Chinese character, with a selected font, sometimes the font is incomplete regarding the common unicode characters, and can't render the unicode character in question. In those cases, if I call the "print" function, the output usually just look like a square box, regardless what the underlying unicode character should look like.

Of course, once I print the unicode character, I can look at the output and then determine that the chosen font misses the particular unicode character. But is there a way to tell before I print, automatically, without having to resort to my own human eyes to determine if a character is included in the font?

I'd also clarify that I know of fonts that are more complete than others. My question is NOT which font I can use so that if I call "print" I'd generally have a reasonable output. Please also ignore the question of how I print the character or if I actually want to print a character. My question is simply, for any given font, how do I tell if a unicode character is missing from the font, without using any manual process relying on human judgement of the output.

MichM
  • 886
  • 1
  • 12
  • 28
  • OS probably makes a difference, which one are you using? – Mark Ransom May 07 '17 at 17:18
  • 1
    How do you know what font is even being used when calling print? Text on stdout could be going to a terminal, a file, some other application... In short, this question is not answerable without more constraints. – gz. May 07 '17 at 17:20
  • I think you are both missing my point. Regardless whether or how I print the character, I just want to know if a character is included in a font. – MichM May 07 '17 at 17:58
  • You ask about rendering, but reject rendering, so isn't your question actually just "How to test font data for undefined characters in Python?". _Which_ font data? – handle May 07 '17 at 18:37
  • 1
    @gz. "Which font is used by console" or "determine if `print` is going to console" would be two additional questions that could be (or perhaps have been) asked. I think this question as worded stands well on its own, if only the detail of which OS would be included. If you're leaving an answer, perhaps those other considerations could be addressed to make the answer more complete. – Mark Ransom May 07 '17 at 19:52
  • @handle ANY font data... That's the point of my question. – MichM May 07 '17 at 22:01
  • @MichM ANY font data won't work, EVERY font data requires a lot of work: You'll have to look at _the_ rendering code and determine how it determines that it can't render the character, ie. that it's not in the font data. Not much info on [Wikipedia](https://en.wikipedia.org/wiki/TrueType#File_formats). – handle May 08 '17 at 05:36
  • thanks @handle. After much creative googling I think I found an answer. Will post this. – MichM May 08 '17 at 21:55

1 Answers1

17

See https://unix.stackexchange.com/questions/247108/how-to-find-out-which-unicode-codepoints-are-defined-in-a-ttf-file

In short, one can install the fonttools package, supply it with the path to any .ttf font file of interest, and check if the long form of the unicode character of interest is included in the font file's unicode map table.

from fontTools.ttLib import TTFont
font = TTFont(fontpath)   # specify the path to the font in question


def char_in_font(unicode_char, font):
    for cmap in font['cmap'].tables:
        if cmap.isUnicode():
            if ord(unicode_char) in cmap.cmap:
                return True
    return False

Then just call the char_in_font function to check if the unicode character is included in the font.

MichM
  • 886
  • 1
  • 12
  • 28
  • Ordinarily I'd complain about a link-only answer, but in this case it's a link to another StackExchange site... and you provided a summary. – Mark Ransom May 08 '17 at 22:41
  • 3
    @MarkRansom So then by your description it's not link-only ;) – MichM May 08 '17 at 22:43
  • 1
    Yes and no - the summary by itself isn't really sufficient to code the solution. P.S. congratulations for finding an answer and coming back to inform the rest of us, you may save someone's bacon some day. – Mark Ransom May 08 '17 at 22:50
  • OK thanks. The code is actually pretty simple, but I added it. – MichM May 08 '17 at 23:06
  • 1
    [FontTools](https://pypi.python.org/pypi/FontTools) supports other formats. – handle May 09 '17 at 05:35
  • This didn't work for me:`char_in_font('햱','/usr/share/fonts/truetype/hack/Hack-Regular.ttf')` returns `False` but it shows up in my terminal using that font. (Python 3.7.3 on Debian buster) – Brian Minton Oct 25 '19 at 21:39