I am creating images of Chinese seal script. I have three true type fonts for this task (Jin_Wen_Da_Zhuan_Ti.7z, Zhong_Guo_Long_Jin_Shi_Zhuan.7z, Zhong_Yan_Yuan_Jin_Wen.7z, for testing purpose only). Below are the appearances in Microsoft Word
of the Chinese character "我" (I/me). Here is my Python script:
import numpy as np
from PIL import Image, ImageFont, ImageDraw, ImageChops
import itertools
import os
def grey2binary(grey, white_value=1):
grey[np.where(grey <= 127)] = 0
grey[np.where(grey > 127)] = white_value
return grey
def create_testing_images(characters,
font_path,
save_to_folder,
sub_folder=None,
image_size=64):
font_size = image_size * 2
if sub_folder is None:
sub_folder = os.path.split(font_path)[-1]
sub_folder = os.path.splitext(sub_folder)[0]
sub_folder_full = os.path.join(save_to_folder, sub_folder)
if not os.path.exists(sub_folder_full):
os.mkdir(sub_folder_full)
font = ImageFont.truetype(font_path,font_size)
bg = Image.new('L',(font_size,font_size),'white')
for char in characters:
img = Image.new('L',(font_size,font_size),'white')
draw = ImageDraw.Draw(img)
draw.text((0,0), text=char, font=font)
diff = ImageChops.difference(img, bg)
bbox = diff.getbbox()
if bbox:
img = img.crop(bbox)
img = img.resize((image_size, image_size), resample=Image.BILINEAR)
img_array = np.array(img)
img_array = grey2binary(img_array, white_value=255)
edge_top = img_array[0, range(image_size)]
edge_left = img_array[range(image_size), 0]
edge_bottom = img_array[image_size - 1, range(image_size)]
edge_right = img_array[range(image_size), image_size - 1]
criterion = sum(itertools.chain(edge_top, edge_left,
edge_bottom, edge_right))
if criteria > 255 * image_size * 2:
img = Image.fromarray(np.uint8(img_array))
img.save(os.path.join(sub_folder_full, char) + '.gif')
where the core snippet is
font = ImageFont.truetype(font_path,font_size)
img = Image.new('L',(font_size,font_size),'white')
draw = ImageDraw.Draw(img)
draw.text((0,0), text=char, font=font)
For example, if you put those fonts in the folder ./fonts
, and call it with
create_testing_images(['我'], 'fonts/金文大篆体.ttf', save_to_folder='test')
the script will create ./test/金文大篆体/我.gif
in your file system.
Now the problem is, though it works well with the first font 金文大篆体.ttf (in Jin_Wen_Da_Zhuan_Ti.7z), the script does not work on the other two fonts, even if they can be rendered correctly in Microsoft Word: for 中國龍金石篆.ttf (in Zhong_Guo_Long_Jin_Shi_Zhuan.7z), it draws nothing so bbox
will be None
; for 中研院金文.ttf (in Zhong_Yan_Yuan_Jin_Wen.7z), it will draw a black frame with no character in the picture.
and thus fails to pass the test of criterion
, whose purpose is for testing an all-black output. I used FontForge to check the properties of the fonts, and found that the first font 金文大篆体.ttf (in Jin_Wen_Da_Zhuan_Ti.7z) uses UnicodeBmp
while the other two use Big5hkscs
which is not the encoding scheme of my system. This may be the reason that the font names are unrecognizable in my system:
Actually I also try to solve this by trying to get the font with the messy font name. I tried pycairo
after installing those fonts:
import cairo
# adapted from
# http://heuristically.wordpress.com/2011/01/31/pycairo-hello-world/
# setup a place to draw
surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, 100, 100)
ctx = cairo.Context (surface)
# paint background
ctx.set_source_rgb(1, 1, 1)
ctx.rectangle(0, 0, 100, 100)
ctx.fill()
# draw text
ctx.select_font_face('金文大篆体')
ctx.set_font_size(80)
ctx.move_to(12,80)
ctx.set_source_rgb(0, 0, 0)
ctx.show_text('我')
# finish up
ctx.stroke() # commit to surface
surface.write_to_png('我.gif')
This works well again with 金文大篆体.ttf (in Jin_Wen_Da_Zhuan_Ti.7z):
but still not with others. For example: neither ctx.select_font_face('中國龍金石篆')
(which reports _cairo_win32_scaled_font_ucs4_to_index:GetGlyphIndicesW
) nor ctx.select_font_face('¤¤°êÀsª÷¥Û½f')
(which draws with the default font) works. (The latter name is the messy code displayed in the font viewer as shown above, obtained by a line of Mathematica code ToCharacterCode["中國龍金石篆", "CP950"] // FromCharacterCode
where CP950
is the code page of Big5.)
So I think I've tried my best to tackle this issue, but still cannot solve it. I've also come up with other ways like renaming the font name with FontForge or changing the system encoding to Big5, but I would still prefer a solution that involves Python only and thus needs less additional actions from the user. Any hints will be greatly appreciated. Thank you.
To the moderators of stackoverflow: this problem may seem "too localized" at its first glance, but it could happen in other languages / other encodings / other fonts, and the solution can be generalized to other cases, so please don't close it with this reason. Thank you.
UPDATE: Weirdly Mathematica can recognize the font name in CP936 (GBK, which can be thought of as my system encoding). Take 中國龍金石篆.ttf (in Zhong_Guo_Long_Jin_Shi_Zhuan.7z) for an example:
But setting ctx.select_font_face('ÖÐøý½ðʯ*')
does not work either, which will create the character image with the default font.