3

I am trying to generate a pdf document using ITextRenderer that contains non-latin characters. In my case here is Bulgarian.

Before calling ITextRenderer, I have a String content that after some processes (like parsing with tidy) looks like that (I am able to see this value through debugging)

Sting content:

td class="description">Вид на потока</td>
td class="description">Статус на потока</td>

The above is just a part of my String. This content contains a valid html syntax. I just put here a small part of it to clarify that until this part, my encoding is right since I am able to read Bulgarian characters.

After that, the following code takes place which creates a document, put it in itextrenderer and generate the pdf file. This code is already tested and working for contents of lating characters since I was able to successfully generate a pdf file for english language.

The problem appears when I switch in another language (Bulgarian) with non latin characters. The generated PDF ignores all the bulgarian characters and the final result is a pdf with a lot of empty lines. This is the part of the code that generates the pdf

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

        dbf.setValidating(false);
        dbf.setNamespaceAware(false);
        dbf.setFeature("http://xml.org/sax/features/namespaces", false);
        dbf.setFeature("http://xml.org/sax/features/validation", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

        DocumentBuilder builder = dbf.newDocumentBuilder();

        Document doc = builder.parse(new ByteArrayInputStream(content.getBytes("UTF-8")));

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        InputStream is = null;

        ITextRenderer renderer = new ITextRenderer();

        renderer.getFontResolver().addFont("fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        renderer.getFontResolver().addFont("fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        renderer.getFontResolver().addFont("fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        renderer.getFontResolver().addFont("fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);


        renderer.setDocument(doc, null);
        renderer.layout();
        renderer.createPDF(outputStream);
        outputStream.close();


        byte[] outputBytes = outputStream.toByteArray();
        is = new ByteArrayInputStream(outputBytes);
        response.setContentType("application");
        response.addHeader("Content-Disposition", "attachment; filename=\"" + "exported.pdf" + "\"");
        response.setContentLength(outputBytes.length);
        response.getOutputStream().write(inputStreamToBytes(is));

I have tried several things (mainly related to encoding) but unfortunately I haven't found a solution yet. Probably I am missing something obvious here :)

I am not sure if this adds any value, but I am using spring and this code runs inside a Controller

Any help will be appreciated.

Thanx

alexandros
  • 644
  • 1
  • 7
  • 16
  • Is your HTML specifying the UTF-8 encoding? Are your font files being found in that path? Take a look at [this gist](https://gist.github.com/643173745182c9becc57) that says it works for Chinese characters on Linux by providing a path to the default location of fonts in the system. – Christian Apr 19 '12 at 15:40
  • Thanx for the reply. Do you think that is a font issue? Do I need specific fonts to display non latin-characters? I am quite sure that my fonts are in the right location but I will give it a try and I will let you know – alexandros Apr 19 '12 at 18:23
  • Hello. I double checked it. Fonts are loaded properly. I also run the FontTest you suggested. I am facing the same problem there. I load fonts that support cyrillic characters. However the pdf ignores them and prints empty lines. Any suggestions? – alexandros Apr 20 '12 at 12:44
  • I just added a new post that explains my problem more detailed http://stackoverflow.com/questions/10250606/generation-of-pdf-from-html-with-non-latin-characters-using-itext-does-not-work – alexandros Apr 20 '12 at 17:24

0 Answers0