0

I'm executing this code from Eclipse and on Tomcat into a webapp

        FileInputStream is = new FileInputStream("C:/Users/admin/Desktop/dummy.txt");

        try {
            FontFactory.register("C:/Workspace/Osmosit/ReportManager/testSvn/ReportManagerCommon/src/main/java/com/osmosit/reportmanager/common/itext/fonts/ARIALUNI.TTF"); 
        } catch (Exception e) {
            e.printStackTrace();
        }



        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(1024);
        Document document = new Document(PageSize.A4);
        PdfWriter writer;

        writer = PdfWriter.getInstance(document, byteArrayOutputStream);
        document.open();

        XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
        document.close();
        byteArrayOutputStream.close();


        FileOutputStream fos = new FileOutputStream("C:/Users/admin/Desktop/prova-web.pdf");
        fos.write(byteArrayOutputStream.toByteArray());
        fos.close();

the dummy.txt is a simple html with aranic and latin characters

<div style="font-family: Arial Unicode MS;" ><p>كما. أي مدن العدّ وقام test latin</p><br /></div>

When I run under eclipse I obtain a correct pd, when it runs on Tomcat I get this:

كما. أي مدن العدّ وقام test latin

PS: I'm using itextpdf ver 5.5.8

Frizz1977
  • 1,121
  • 13
  • 21

1 Answers1

1

You have an encoding problem. Either you saved dummy.txt using the wrong encoding (e.g. as Latin-1 instead of as UTF-8), or you are reading dummy.txt using the wrong encoding.

See html to pdf convert, cyrillic characters not displayed properly and adapt the line in which you use parseHtml():

XMLWorkerHelper.getInstance().parseXHtml(writer, document,
    is, null, Charset.forName("UTF-8"), fontImp);

Take a look at the ParseHtml11 example to find out what fontImp is about.

You are also making another mistake: Arabic is read from right to left, and in your code, you aren't defining the run direction. See Arabic characters from html content to pdf using iText

In your case, I would put the Arabic text in a table and I would follow the ParseHtml7 example from the official documentation:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    // Styles
    CSSResolver cssResolver = new StyleAttrCSSResolver();
    XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontProvider.register("resources/fonts/NotoNaskhArabic-Regular.ttf");
    CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
    // HTML
    HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
    htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
    // Pipelines
    ElementList elements = new ElementList();
    ElementHandlerPipeline pdf = new ElementHandlerPipeline(elements, null);
    HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
    CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

    // XML Worker
    XMLWorker worker = new XMLWorker(css, true);
    XMLParser p = new XMLParser(worker);
    p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));

    PdfPTable table = new PdfPTable(1);
    PdfPCell cell = new PdfPCell();
    cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
    for (Element e : elements) {
        cell.addElement(e);
    }
    table.addCell(cell);
    document.add(table);
    // step 5
    document.close();
}
Community
  • 1
  • 1
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165