9

I want to convert an HTML page that contains Arabic characters to a PDF file using FlyingSaucer, but the generated PDF does not contain combined characters and prints the output backwards.

HTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>

    <body style="font-size:15px;font-family: Arial Unicode MS;">

        <center  style="font-size: 18px; font-family: Arial Unicode MS;">
            <b>
                <i style="font-family: Arial Unicode MS;">
                    &#x062C;&#x0645;&#x064A;&#x0639; &#x0627;&#x0644;&#x062D;&#x0642;&#x0648;&#x0642;<br />
                </i>
            </b>
        </center>
    </body>
</html>

Java Excerpt:

String inputFile = "c:\\html.html";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "c:\\html.pdf";
        OutputStream os = new FileOutputStream(outputFile);

        ITextRenderer renderer = new ITextRenderer();
        renderer.getFontResolver().addFont("c://ARIALUNI.TTF", BaseFont.IDENTITY_H,BaseFont.EMBEDDED);

        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);
        os.close();

Actual PDF Result: actual result

Expected PDF Result: expected result

What can I do to obtain the right result?

riddle_me_this
  • 8,575
  • 10
  • 55
  • 80
Samy Louize Hanna
  • 821
  • 2
  • 8
  • 15
  • Actually you are trying to convert a canvas image to pdf ??? – CoderNeji Jul 07 '15 at 06:53
  • This looks like a flying-saucer bug to me. Arabic unicode characters are in a well-defined range, and are (obviously) known to be RTL (right to left). Clearly the browser is rendering RTL, but flying saucer is not. Report the bug to google. –  Jul 13 '15 at 23:47
  • Did you have a solution for Arabic format? – Hana90 Feb 06 '17 at 10:03
  • Thanks, I can handle it by creating image within canvas containing Arabic text, then when converting to pdf, there is image rather than text :) such like this example jsfiddle.net/amaan/WxmQR/1 – Hana90 Feb 06 '17 at 11:50

2 Answers2

0

While I was working with Arabic font, I faced similar alignment issue. Arabic is an RTL Language. You need specific jars to generate PDFs in an RTL Language. Currently when you are trying to generate PDF, mode is normal LTR because of which you are getting current output.

NANCY
  • 61
  • 7
0

Yes it related to RTL but if you have no choice related to fonts then you can use Arial fonts which has all characters required by you. follow this link https://stackoverflow.com/a/47801584/3335776 to see code.

Some how issue is with flying saucer default fonts

you can find Complete article Here

LNT
  • 876
  • 8
  • 18