We are using itext 2.1.7.
We have an embedded rich text editor (CKEditor) whose contents (html) are stored in a database. The editor allows contents to be formatted (bold, italic).
We generate pdf based on those html contents using the HTMLWorker.parseToList method. It works well and renders formatted content properly. Except when some diacritics are formatted bold or italic (see capture below).
Some code to reproduce the failing behaviour :
ArrayList elements;
Font diacriticReadyFont = FontFactory.getFont("/images/arial.ttf", BaseFont.IDENTITY_H, true);
// Add one normally styled paragraph with Czech diacritics
Paragraph p1 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p>A normal style paragraph with Czech diacritics shows fine : Č,Ć,Š,Ž,Đ</p>"), null);
for (Object element : elements) {
p1.add(element);
}
getDocument().add(p1);
// Add one mixed style paragraph with standard characters
Paragraph p2 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p>A paragraph with some <em>italic text </em>and <strong>bold text </strong>shows fine</p>"), null);
for (Object element : elements) {
p2.add(element);
}
getDocument().add(p2);
// Add one bold style paragraph with Czech diacritics
Paragraph p3 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p><strong>However, bold text with Czech diacritics Č,Ć,Š,Ž,Đ will miss some of those diacritics</strong></p>"), null);
for (Object element : elements) {
p3.add(element);
}
getDocument().add(p3);
// Add one italic style paragraph with Czech diacritics
Paragraph p4 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p><em>Also, italic text with Czech diacritics Č,Ć,Š,Ž,Đ will miss some too</em></p>"), null);
for (Object element : elements) {
p4.add(element);
}
getDocument().add(p4);
// Forcing the font on "element" paragraphs does not help
Paragraph p5 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p><strong>Forcing the font on \"element\" paragraphs does not help : Č,Ć,Š,Ž,Đ</strong></p>"), null);
for (Object element : elements) {
((Paragraph)element).setFont(diacriticReadyFont);
p5.add(element);
}
getDocument().add(p5);
gives :
According to my analysis (greatly helped by this excellent post : Can't get Czech characters while generating a PDF), it seems the font automagically applied by the HTMLWorker to the formatted (bold or italic) text is the culprit. As paragraph 5 example shows, manually forcing this font does not help.
Any insight ?