8

When I use Flying Saucer to convert html page with Chinese character. The Chinese character displayed as a box like below

enter image description here

I have tried both methods: using the css as in this answer Flying Saucer font for unicode characters and using the code as in this answer Flying Saucer iTextPDF Chinese Fonts, but they did not work. Does anyone have another suggestion?

I have declared the UTF-8 charset in meta tag as below:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html language="en">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">   </meta>
<link rel="stylesheet" type="text/css" href="file:///opt/template/employer.css"/>
<link rel="stylesheet" type="text/css" href="file:///opt/template/style.css"/>
<link rel="stylesheet" type="text/css" media="print" href="file:///opt/template/print.css"/>
</head>

Here is the relevant section with the chinese characters:

<tbody><tr>
                                        <td align="left" width="150" valign="top">
                                            Name
                                        </td>
                                        <td align="left" width="305" valign="top">
                                            <label id="candidateName">VU DINH THE / 你好</label>
                                        </td>
                                    </tr>
                                    <tr>
                                        <td align="left" width="150" valign="top">
                                            Gender/Status
                                        </td>
                                        <td align="left" width="305" valign="top">
                                            <label id="gender">Female</label> / <label id="status">Single
</label>
                                        </td>
                                    </tr>
                                    <tr>
                                        <td align="left" width="150" valign="top">
                                            Date of Birth/Age
                                        </td>
                                        <td align="left" width="305" valign="top">
                                            <label id="dob">12 Sep 1985</label> / <label id="age">30</label>
                                        </td>
                                    </tr>

And the content of print.css:

@font-face {
    font-family: Arial Unicode MS;
    src: url('file:///opt/template/arialuni.ttf');
    -fs-pdf-font-embed: embed;
    -fs-pdf-font-encoding: Identity-H;
}
Community
  • 1
  • 1
Tony Vu
  • 4,251
  • 3
  • 31
  • 38
  • Can you add the HTML code you're trying to transform, including the chinese character that doesn't render ? – obourgain Aug 17 '15 at 06:51
  • @obourgain, i have added the relevant html parts and the css – Tony Vu Aug 18 '15 at 07:08
  • The code seems correct, it works fine on my PC. The problem may come from the `arialuni.ttf` file. What is the size of the file ? – obourgain Aug 19 '15 at 09:24
  • @obourgain, it is 1588364 – Tony Vu Aug 19 '15 at 09:44
  • @obourgain, you are right. I replaced the file downloaded from this http://code.google.com/p/ipwn/downloads/detail?name=arialuni.ttf with the ARIALUNI.TTF from this http://www.rmhoist.com/downloads/font/ and it works now. Thank you very much. Please post your answer so I can accept. – Tony Vu Aug 19 '15 at 10:05

1 Answers1

5

Replacement of a character by an empty square or rectangle usually means that the character is not defined in the font file, and the system doesn't find information to draw it.

In this case, the HTML and CSS code is correct, but the arialuni.ttf file is incomplete.

For reference, the arialuni.ttf should be ~23 MB.

obourgain
  • 8,856
  • 6
  • 42
  • 57