0

I am trying to create a PDF file from an html document which has an Arabic font named Dubai Font. After conversion a few Arabic characters are not getting displayed in the PDF. In the html file Dubai font works correctly but not after converting few Arabic characters are not getting displayed in the PDF.

I have tried the ParseHtml9 example mentioned in this stackoverflow question.

The itext jar used is itextpdf-5.5.8.jar.

Please note:- This code works completely as per expected while using Noto Naskh Arabic font. The problem happens only while using Dubai Font.

ParseHtml9.java :-

Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));

// step 3
document.open();

// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();

XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/Dubai-Regular.ttf");

CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);

HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));

// step 5
document.close();

Html file:-

<body>
<table>
  <tr>
    <td>Lawrence of Arabia</td>
    <td dir="rtl" style="font-family: Dubai">لورانس العرب</td>
  </tr>
</table>
</body>
</html>

HTML Output is:-

Lawrence of Arabia لورانس العرب

PDF Output is:- Lawrence of Arabia لونس لعر

Few characters like ر ,ا and ب are not getting rendered in the PDF.

Please help me to fix this. Thanks in advance.

euanGrous
  • 15
  • 5
  • I don't have the Dubai font installed. It's possible that the font is missing those characters; in HTML, the browser can fall back to another font for the missing characters; perhaps it's not doing that in the PDF. – Erwin Bolwidt Mar 31 '19 at 09:22
  • @ErwinBolwidt, Thanks! You can get Dubai font [here](https://dubaifont.com/download) . Just to add some more information:- Recently I had got this working with itext7. And in other online htmltopdf converters are also working fine. Is there any way to fix this? – euanGrous Apr 01 '19 at 05:27

0 Answers0