I am trying to create a PDF file from an html document which has an Arabic font named Dubai Font. After conversion a few Arabic characters are not getting displayed in the PDF. In the html file Dubai font works correctly but not after converting few Arabic characters are not getting displayed in the PDF.
I have tried the ParseHtml9 example mentioned in this stackoverflow question.
The itext jar used is itextpdf-5.5.8.jar.
Please note:- This code works completely as per expected while using Noto Naskh Arabic font. The problem happens only while using Dubai Font.
ParseHtml9.java :-
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/Dubai-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));
// step 5
document.close();
Html file:-
<body>
<table>
<tr>
<td>Lawrence of Arabia</td>
<td dir="rtl" style="font-family: Dubai">لورانس العرب</td>
</tr>
</table>
</body>
</html>
HTML Output is:-
Lawrence of Arabia لورانس العرب
PDF Output is:- Lawrence of Arabia لونس لعر
Few characters like ر ,ا and ب are not getting rendered in the PDF.
Please help me to fix this. Thanks in advance.