Generating pdf with itext : Some Czech characters not showing in HTMLWorker parsed paragraphs

Question

We are using itext 2.1.7.

We have an embedded rich text editor (CKEditor) whose contents (html) are stored in a database. The editor allows contents to be formatted (bold, italic).

We generate pdf based on those html contents using the HTMLWorker.parseToList method. It works well and renders formatted content properly. Except when some diacritics are formatted bold or italic (see capture below).

Some code to reproduce the failing behaviour :

    ArrayList elements;
    Font diacriticReadyFont = FontFactory.getFont("/images/arial.ttf", BaseFont.IDENTITY_H, true);

    // Add one normally styled paragraph with Czech diacritics
    Paragraph p1 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p>A normal style paragraph with Czech diacritics shows fine : Č,Ć,&Scaron;,Ž,Đ</p>"), null);
    for (Object element : elements) {
        p1.add(element);
    }
    getDocument().add(p1);

    // Add one mixed style paragraph with standard characters
    Paragraph p2 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p>A paragraph with some <em>italic text </em>and <strong>bold text </strong>shows fine</p>"), null);
    for (Object element : elements) {
        p2.add(element);
    }
    getDocument().add(p2);

    // Add one bold style paragraph with Czech diacritics
    Paragraph p3 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p><strong>However, bold text with Czech diacritics Č,Ć,&Scaron;,Ž,Đ will miss some of those diacritics</strong></p>"), null);
    for (Object element : elements) {
        p3.add(element);
    }
    getDocument().add(p3);

    // Add one italic style paragraph with Czech diacritics
    Paragraph p4 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p><em>Also, italic text with Czech diacritics Č,Ć,&Scaron;,Ž,Đ will miss some too</em></p>"), null);
    for (Object element : elements) {
        p4.add(element);
    }
    getDocument().add(p4);

    // Forcing the font on "element" paragraphs does not help
    Paragraph p5 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p><strong>Forcing the font on \"element\" paragraphs does not help : Č,Ć,&Scaron;,Ž,Đ</strong></p>"), null);
    for (Object element : elements) {
        ((Paragraph)element).setFont(diacriticReadyFont);
        p5.add(element);
    }
    getDocument().add(p5);

gives :

According to my analysis (greatly helped by this excellent post : Can't get Czech characters while generating a PDF), it seems the font automagically applied by the HTMLWorker to the formatted (bold or italic) text is the culprit. As paragraph 5 example shows, manually forcing this font does not help.

Any insight ?

*As paragraph 5 example shows, manually forcing this font does not help.* - setting the `Font` of a `Paragraph` object does not change anything in the objects already added before, it merely changes the font used for plain text you later add to the paragraph; so, it obviously won't help. — mkl, Apr 28 '16 at 14:50

Generating pdf with itext : Some Czech characters not showing in HTMLWorker parsed paragraphs

0 Answers0