itextpdf HTML to PDF containing Cyrillic letters

Question

I have asked another question about this problem but I couldn't make it work. I changed my code, so now it's something like this:

import java.io.FileOutputStream;
import java.io.StringReader;

import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class HTM {

    public static void main(String ... args ) {
        try {
            Document document = new Document(PageSize.LETTER);
            PdfWriter pdfWriter = PdfWriter.getInstance
                           (document, new FileOutputStream("C:\\testpdf.pdf"));
            document.open();

            XMLWorkerHelper worker = XMLWorkerHelper.getInstance();

            String htmlString = "<html><head>"
                    + "<meta http-equiv=\"content-type\" content=\"application/xhtml+xml; charset=UTF-8\" />"
                    + "</head><body>"
                    + "<h1>Zdravo Кристијан!</h1>"
                    + "</body></html>";


            worker.parseXHtml(pdfWriter, document, new StringReader(htmlString));
            document.close();
            System.out.println("Done.");
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

My problem is that the pdf doesn't display the Cyrillic characters. I know how to make a simple pdf with different charsets and fonts but I want to convert a html file or string (in my case it is a html string) into pdf. Thanks in advance.

Are you sure the used fonts **support** Cyrillic characters? Read http://stackoverflow.com/questions/26631815/cant-get-czech-characters-while-generating-a-pdf for a similar problem. — Jongware, Jan 05 '15 at 12:59
Also, you should make sure that your entire toolchain is set up for UTF-8. — Williham Totland, Jan 05 '15 at 13:00
yes. The problem is that I need a html converted into pdf. I succeeded with paragraphs, but that's not what I need. Do you know how to use fonts in my example? — Kristijan Iliev, Jan 05 '15 at 13:01
Yes, there are plenty of examples here: http://itextpdf.com/sandbox/xmlworker For instance: [ParseHtmlFonts](http://itextpdf.com/sandbox/xmlworker/D06_ParseHtmlFonts), [ParseHtmlAsian 1](http://itextpdf.com/sandbox/xmlworker/D07_ParseHtmlAsian), [ParseHtmlAsian 2](http://itextpdf.com/sandbox/xmlworker/D07bis_ParseHtmlAsian), and [ParseHtmlAsian 3](http://itextpdf.com/sandbox/xmlworker/D07tris_ParseHtmlAsian). If you succeed in making the Asian examples work, then Cyrillic shouldn't be a problem. — Bruno Lowagie, Jan 05 '15 at 15:14
@BrunoLowagie thank you very much. I saw your examples before, but I had some errors, maybe because I was in a rush. Anyway, thanks again — Kristijan Iliev, Jan 06 '15 at 11:48
I was in a rush too, I didn't have the time to provide a complete answer, so I am happy to see that you answered your own question. — Bruno Lowagie, Jan 06 '15 at 11:54

score 3 · Answer 1 · answered Jan 06 '15 at 07:34

Based on the comment from @bruno-lowagie only a small change is needed on your posted code to get it work on Windows. For more information on how to specify a specific font have a look in the examples proposed by Bruno.

public class HTM {

    public static void main(String ... args ) {
        try {
            Document document = new Document(PageSize.LETTER);

            PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream("testpdf.pdf"));
            document.open();

            XMLWorkerHelper worker = XMLWorkerHelper.getInstance();

            String htmlString = "<html><head>"
                    + "<meta http-equiv=\"content-type\" content=\"application/xhtml+xml; charset=UTF-8\" />"
                    + "</head><body>"
                    + "<p style=\"font-family:courier new\">" // the font to use
                    + "<h1>Zdravo Кристијан!</h1>"
                    + "</p>"
                    + "<h1>Zdravo Кристијан!</h1>"
                    + "</body></html>";

            worker.parseXHtml(pdfWriter, document, new StringReader(htmlString));
            document.close();
            System.out.println("Done.");
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

I tried your code and I was shocked how close I was to a solution. Thank you! — Kristijan Iliev, Jan 06 '15 at 12:08

Kristijan Iliev · Accepted Answer · 2015-09-03T10:04:43.397

I tried many things, but everytime I was missing something. Thanks @BrunoLowagie and @SubOptimal. Here's my code that I make it run for a custom fonts. It also contains a simple html as a string, but there is shown (in comments) how it can be done with actual html and css files.

public class HtmlToPdf {
    public static final String DEST = "/home/christian/Desktop/testDoc.pdf";

    public void createPdf(String file) throws IOException, DocumentException {
        // step 1
        Document document = new Document();

        // step 2
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
        writer.setInitialLeading(12.5f);

        // step 3
        document.open();

        // step 4

        // CSS
        CSSResolver cssResolver = new StyleAttrCSSResolver();
        // CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream(CSS));
        // cssResolver.addCss(cssFile);

        // HTML
        XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
        fontProvider.register("fonts/Arimo-Regular.ttf");
        fontProvider.register("fonts/Arimo-Bold.ttf");
        fontProvider.register("fonts/Arimo-Italic.ttf");
        fontProvider.addFontSubstitute("lowagie", "Arimo");
        CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
        HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
        htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

        // Pipelines
        PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
        HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
        CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

        // XML Worker
        XMLWorker worker = new XMLWorker(css, true);
        XMLParser p = new XMLParser(worker);

        // p.parse(new FileInputStream(HTML));
        String htmlContent = " HERE GOES HTML CODE ";
        p.parse(new StringReader(htmlContent));
        // step 5
        document.close();
    }

    public static void main(String[] args) throws IOException, DocumentException {
        new D06_ParseHtmlFonts().createPdf(DEST);
    }
}

I noticed that it is important to have font-family: actual font that supports wished encoding; in css/html and for email clients always to use inline css.

itextpdf HTML to PDF containing Cyrillic letters

2 Answers2