8

I am trying to use NOTO fonts (https://www.google.com/get/noto/) to display Chinese characters. Here is my sample code,a modified sample code from iText.

public void createPdf(String filename) throws IOException, DocumentException {

    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(filename));
    document.open();

    //This is simple English Font
    FontFactory.register("c:/temp/fonts/NotoSerif-Bold.ttf", "my_nato_font");
    Font myBoldFont = FontFactory.getFont("my_nato_font");
    BaseFont bf = myBoldFont.getBaseFont();
    document.add(new Paragraph(bf.getPostscriptFontName(), myBoldFont));


    //This is Chinese font


    //Option 1 :
    Font myAdobeTypekit = FontFactory.getFont("SourceHanSansSC-Regular", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

    //Option 2 :
     /*FontFactory.register("C:/temp/AdobeFonts/source-han-sans-1.001R/OTF/SimplifiedChinese/SourceHanSansSC-Regular.otf", "my_hans_font");
     Font myAdobeTypekit = FontFactory.getFont("my_hans_font", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);*/



    document.add(Chunk.NEWLINE);
    document.add(new Paragraph("高興", myAdobeTypekit));
    document.add(Chunk.NEWLINE);

    //simplified chinese
    document.add(new Paragraph("朝辞白帝彩云间", myAdobeTypekit));
    document.add(Chunk.NEWLINE);

    document.add(new Paragraph("高兴", myAdobeTypekit));
    document.add(new Paragraph("The Source Han Sans Traditional Chinese ", myAdobeTypekit));


    document.close();
}

I have downloaded the fonts files on my machine. I am using two approaches

  1. To use the equivalent font family in Adobe

  2. Embed the otf file in pdf

Using approach 1, I would expect the Chinese characters to be displayed in pdf but English text is displayed and it is blank for Chinese characters.

Using approach 2, when I try embedding the fonts with pdf, which is not the path I would like to take, there is error in opening pdf. enter image description here

Update : If I look at this example http://itextpdf.com/examples/iia.php?id=214

and in this code

public void createPdf(String filename, boolean appearances, boolean font)
    throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));
    // step 3
    document.open();
    // step 4
    writer.getAcroForm().setNeedAppearances(appearances);
    TextField text = new TextField(writer, new Rectangle(36, 806, 559, 780), "description");
    text.setOptions(TextField.MULTILINE);
    if (font) {
        BaseFont unicode =
            BaseFont.createFont("c:/windows/fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
        text.setExtensionFont(BaseFont.createFont());
        ArrayList<BaseFont> list = new ArrayList<BaseFont>();
        list.add(unicode);
        text.setSubstitutionFonts(list);
        BaseFont f= (BaseFont)text.getSubstitutionFonts().get(0);
        System.out.println(f.getPostscriptFontName());

    }
    text.setText(TEXT);

    writer.addAnnotation(text.getTextField());
    // step 5
    document.close();
}

I substitute, c:/windows/fonts/arialuni.ttf with C:/temp/fonts/NotoSansCJKtc-Thin.otf , I do not see the Chinese characters. The text to convert now is

public static final String TEXT = "These are the protagonists in 'Hero', a movie by Zhang Yimou:\n"
    + "\u7121\u540d (Nameless), \u6b98\u528d (Broken Sword), "
    + "\u98db\u96ea (Flying Snow), \u5982\u6708 (Moon), "
    + "\u79e6\u738b (the King), and \u9577\u7a7a (Sky).";
Morteza Jalambadani
  • 2,190
  • 6
  • 21
  • 35
vsingh
  • 6,365
  • 3
  • 53
  • 57
  • 1
    1. You are adding Chinese glyphs AS-IS in your code. That is bad programming. Use the Unicode notation instead. 2. What makes you think the NOTO fonts support Chinese? 3. You define an alias `"my_nato_font"`, but you try getting your font using the alias `"SourceHanSansSC-Regular"` – Bruno Lowagie Mar 24 '15 at 16:34
  • You have adapted your question, but my answer is still valid. I tested the CJK fonts from Google using the code from the official documentation (not your suboptimal code sample) and the Chinese test displays correctly. – Bruno Lowagie Mar 24 '15 at 17:29
  • I tried the example you have referred and updated my question. I do not see the Chinese characters – vsingh Mar 24 '15 at 19:17
  • Sorry about that. Yes it is an example to fill out form but I was focusing on Chinese characters than the form. – vsingh Mar 24 '15 at 19:46
  • As you can see from the screen shot in my answer, creating a PDF with Chinese characters using a NOTO font is really easy, provided that you read the documentation instead of just copy/pasting it. – Bruno Lowagie Mar 24 '15 at 19:51
  • Thanks for the sample code. I am able to view your pdf and see Chinese characters. I run your code and can create the pdf but get the pop up "Cannot extract the embedded Font" when I open the pdf. The size of file is nearly same 228Kb. I am running on Windows 7 Professional. Not sure what is going on. – vsingh Mar 24 '15 at 20:08
  • Which viewer are you using? Adobe Reader, right? The size of my file is 227.13 KB. – Bruno Lowagie Mar 24 '15 at 20:11
  • Yes. Adobe Reader XI Version 11.0.10. My file is 227 Kb – vsingh Mar 24 '15 at 20:13
  • Works for me. I also checked the syntax with Adobe Acrobat's Preflight and it doesn't give me any errors. You should try the PDF on another computer as the problem may be a problem on your local machine. – Bruno Lowagie Mar 24 '15 at 20:15
  • Tried on 2 other machine. Same issue. Looks like something is wrong with the version of adobe we have. When I upload and view the file in chrome, looks perfect. I uploaded the file at this url http://www.skill-guru.com/blog/wp-content/uploads/2015/03/chinese.pdf. Can you please save the file as pdf and verify if you are able to view it ? What version of adobe reader you have ? – vsingh Mar 24 '15 at 20:28
  • itext 4.2.1 and itext-asian 5.2.0. Pulled from maven repo. – vsingh Mar 24 '15 at 20:40
  • There is no such thing as iText 4.2.1. If there is, it is not an official version. Throw it away as far as you can and never use it again. – Bruno Lowagie Mar 24 '15 at 20:41
  • I had used this http://mvnrepository.com/artifact/com.lowagie/itext I am am using this now http://mvnrepository.com/artifact/com.itextpdf/itextpdf/5.5.5 and works perfectly. That was the problem. !! Thanks a lot for your help – vsingh Mar 24 '15 at 20:42
  • Although my name is on those releases, I am not responsible for them. I'll see if I can have them removed. – Bruno Lowagie Mar 24 '15 at 20:47
  • The group id in maven is same as old version of iText com.lowagie . That is what makes it confusing. If you search for iText in maven repo, version 5.5.5 is no where in first few results. Anyone who has used old version would go for this group id com.lowagie rather than new one. A note on your site will also be helpful – vsingh Mar 25 '15 at 13:06

1 Answers1

9

Clearly you are using the wrong font. I have downloaded the fonts from the link you posted. You are using NotoSerif-Bold.ttf, a font that does not support Chinese. However, the ZIP file also contains fonts with CJK in the font name. As described on the site you refer to, CJK stands for Chinese, Japanese and Korean. Use one of those CJK fonts and you'll be able to product Chinese text in your PDF.

Take a look at the NotoExample in which I use one of the fonts from the ZIP file you refer to. It creates a PDF that looks like this:

enter image description here

This is the code I used:

public static final String FONT = "resources/fonts/NotoSansCJKsc-Regular.otf";
public static final String TEXT = "These are the protagonists in 'Hero', a movie by Zhang Yimou:\n"
    + "\u7121\u540d (Nameless), \u6b98\u528d (Broken Sword), "
    + "\u98db\u96ea (Flying Snow), \u5982\u6708 (Moon), "
    + "\u79e6\u738b (the King), and \u9577\u7a7a (Sky).";
public static final String CHINESE = "\u5341\u950a\u57cb\u4f0f";
public static final String JAPANESE = "\u8ab0\u3082\u77e5\u3089\u306a\u3044";
public static final String KOREAN = "\ube48\uc9d1";

public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(DEST));
    document.open();
    Font font = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    Paragraph p = new Paragraph(TEXT, font);
    document.add(p);
    document.add(new Paragraph(CHINESE, font));
    document.add(new Paragraph(JAPANESE, font));
    document.add(new Paragraph(KOREAN, font));
    document.close();
}

You claim that Adobe Reader XI doesn't show the Chinese glyphs, but instead shows a "Cannot extract the embedded Font" message. I can not reproduce this [*]. I have even used Preflight in Adobe Acrobat as indicated here, but no errors were found:

enter image description here

[*] Update: this problem can be reproduced if you use iText 4.2.x, a version that was released by somebody unknown to iText Group NV. Please use iText versions higher than 5 only.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Thanks for the detailed reply . The problem was incorrect version of iText. If you are pulling from maven, make sure to use the new versions com.itextpdf itextpdf 5.5.5 com.itextpdf itext-asian 5.2.0 – vsingh Mar 25 '15 at 13:07
  • Same example is not working if we replace NotoSanCJKsc-Regular with SourceHanSansSc-Regular. In fact I tried the same example with Source Han Sans and Serif fonts. CJK characters are invisible on the generated pdf. Does iText has some issue with Source Han family? – Vaibhav Raj Jun 02 '17 at 09:28
  • @VaibhavRaj Have you already tried iText 7? Are you an iText customer? If there's a bug in Source Han family, you should ask the provider of the Source Han fonts for support. If you want us to look at it, you should have a support contract with us. – Bruno Lowagie Jun 02 '17 at 10:53
  • We are currently using iText 5.5. – Vaibhav Raj Jun 02 '17 at 11:52
  • I tried 'NotoSansCJKsc-Regular.otf' and it is working fine for us. But it's of approx 14.4 MB in size. Do you have any better option than this? @BrunoLowagie – Aanal Shah Feb 20 '19 at 08:47