XMLWorkerHelper performance slow

Question

I am using itext 5.3 in java to generate PDF. I was using HTMLWorker.parseToList(Reader, StyleSheet) to convert part to String which contains HTML tags like Bold, Italic, href etc to PDF. I don't want to generate complete HTML to PDF instead a part of text in PDF will be HTML. For example, strings like "This is test bold text" to convert part of text to be bold.

The performance is good with HTMLWorker.

Since its deprecated now, I started using XMLWorkerHelper.parseXHtml(ElementHandler, Reader) and I found the performance is very bad compared to HTMLWorker.

If anyone has any idea about the solution or any other workaround, please let me know.

Below is the sample code and other posting with sample code is at
HTML to List using XMLWorker

public class HTMLElementHandler implements ElementHandler {

    private Phrase phrase;
    private Font font;

    private HTMLElementHandler(Phrase phrase, Font font) {
        super();
        setPhrase(phrase);
        setFont(font);
    }

    @Override
    public void add(Writable writable) {
        if (writable instanceof WritableElement) {
            List<Element> elements = ((WritableElement) writable).elements();
            for (Element elem : elements) {
                List<Chunk> chunks = elem.getChunks();
                for (Chunk chunk : chunks) {
                    Font chunkFont = chunk.getFont();
                    //Do something with fonts here
                }
                phrase.setFont(font);
                phrase.add(elem);
            }
        }
    }

    public Phrase getPhrase() {
        return this.phrase;
    }
    public void setPhrase(Phrase phrase) {
        this.phrase = phrase;
    }   
    public Font getFont() {
        return this.font;
    }    
    public void setFont(Font font) {
        this.font = font;
    }
}

AnotherJavafile.java

Phrase ph = new Phrase();
Font font = FontFactory.getFont(FontFactory.getFont("Arial").getFamilyname(), 12, new BaseColor(0, 102, 153));
XMLWorkerHelper.getInstance().parseXHtml(new HTMLElementHandler(phrase, font), "This is test <bold> bold </bold> text");

I too am unfortunately seeing some rather poor performance from this class, though not from Java, from iTextSharp, the C# port of the same library. — Joel Martinez, Oct 18 '13 at 17:58
I am, too. The culperate is the following method internal to ParseXHtml: iTextSharp.text.FontFactoryImp.RegisterDirectories — Josh Mouch, Jun 20 '14 at 14:50
see also itextsharp issue - http://stackoverflow.com/q/21275800/179972 — John K, Sep 29 '14 at 18:18

score 0 · Answer 1 · answered Dec 16 '17 at 10:13

The cause of this problem is registering of font directories which is done as part of the operation, before the (X)HTML is effectively parsed. This takes an awful lot of time.

This can be sidestepped by providing a font provider that will not look for any fonts, i.e. will not register any font directories. This font provider can be created with:

new XMLWorkerFontProvider( XMLWorkerFontProvider.DONTLOOKFORFONTS )

You can supply this font provider as a parameter to XMLWorkerHelper.getInstance( ).parseXHtml( ... ), however you can't if you have an ElementHandler as your first parameter. I have no clue why really, I only use iText occasionally.

I'll give an example in the case the (X)HTML is in a String:

File tempPdfFile = File.createTempFile( "temp_pdf_", ".pdf" );
tempPdfFile.deleteOnExit( );

try( OutputStream os = new FileOutputStream( tempPdfFile ) )
{
    Document pdfDocument = new Document( PageSize.A4 );
    PdfWriter pdfWriter = PdfWriter.getInstance( pdfDocument, os );
    pdfDocument.open( );

    String htmlText = getHtmlText( ); // your method that returns HTML as text

    XMLWorkerHelper.getInstance( ).parseXHtml ( 
        pdfWriter,
        pdfDocument,
        new ByteArrayInputStream( htmlText.getBytes( StandardCharsets.UTF_8 ) ),
        StandardCharsets.UTF_8,
        new XMLWorkerFontProvider( XMLWorkerFontProvider.DONTLOOKFORFONTS )
    );

    pdfDocument.close( );
    pdfWriter.close( );
}

Desktop.getDesktop( ).open( tempPdfFile );

XMLWorkerHelper performance slow

1 Answers1

Linked