4

I've followed this article to use FlyingSaucer to convert XHTML to PDF and it's brilliant but has one major downfall... it's ridiculously slow!

I'm finding that it takes between 1 and 2 minutes to render a PDF from an XHTML, regardless of how simple that page is.

Basic code:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import org.xhtmlrenderer.pdf.ITextRenderer;
import com.lowagie.text.DocumentException;

public class FirstDoc {

    public static void main(String[] args) throws IOException, DocumentException {

        String inputFile = "firstdoc.xhtml";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "firstdoc.pdf";
        OutputStream os = new FileOutputStream(outputFile);

        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);

        os.close();
    }
}

Sample XHTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>My First Document</title>
        <style type="text/css"> b { color: green; } </style>
    </head>
    <body>
        <p>
            <b>Greetings Earthlings!</b>
            We've come for your Java.
        </p>
    </body>
</html>

Does anyone know how to improve the performance of FlyingSaucer?

Failing that, is anyone able to recommend an alternative Java library which is effective at rendering a PDF from a URL to an (X)HTML document with external CSS and images generated from URLs?

bluish
  • 26,356
  • 27
  • 122
  • 180
Edd
  • 8,402
  • 14
  • 47
  • 73

5 Answers5

15

I was facing the same problem as Edd.

Sadly the next approach didn't work Java DocumentBuilder: xml parsing is very slow? by Marek Piechut completely for me - my HTML entities got lost on the way.

DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
fac.setNamespaceAware(false);
fac.setValidating(false);
fac.setFeature("http://xml.org/sax/features/namespaces", false);
fac.setFeature("http://xml.org/sax/features/validation", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
fac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = fac.newDocumentBuilder();

What finally did the trick were these lines:

DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = fac.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());

By using the built-in Java EntityResolver for resolving the DTD it got faster tremendously.

Mister Henson
  • 505
  • 4
  • 20
4

The problem is, that you are probably using this code from the linked article:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new StringBufferInputStream(buf.toString()));

This way the builder will try to load the the referenced DTD.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Loading and parsing the DTD takes a lot of time.

If you are using

ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url); // not setDocument(document)

the DTD won't be resolved by Flying Saucer. If you want to load a Document, not set an url, see

Community
  • 1
  • 1
Adam
  • 5,045
  • 1
  • 25
  • 31
2

I would make 2 recommendations:

  1. Profile it.

  2. Wrap the OutputStream in a BufferedOutputStream

  3. Profile it. (Oops ... I'm repeating myself. Well, you get the picture.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • This may be a dumb question, but what do you mean by "profile it"? – Edd Mar 25 '11 at 11:29
  • 1
    Using the following partial code replacement: `InputStream is = new FileInputStream(new File(inputFile)); DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = builder.parse(is); ITextRenderer renderer = new ITextRenderer(); renderer.setDocument(doc, null);` it looks like the hold up is actually in parsing the xhtml file into a dom object rather than during rendering – Edd Mar 25 '11 at 11:52
  • I mean, run your application using a Java profiler. For example: hprof - http://java.sun.com/developer/technicalArticles/Programming/HPROF.html – Stephen C Mar 25 '11 at 13:55
  • Also, try wrapping the FileInputStream in a BufferedInputStream. It could be that you are reading / writing to a file system that performs particularly badly with small (e.g. 1 byte) reads / writes. – Stephen C Mar 25 '11 at 14:00
2

Let me start by saying that I used your sample code and sample xhtml, and it "Ran in 2675ms".

I downloaded flyingsaucer R8. And put three of the jars into my classpath.

core-renderer.jar, iText-2.0.8.jar, xml-apis-xerces-2.9.1.jar

I measured the run time by modifying your code with instrumentation...

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import org.xhtmlrenderer.pdf.ITextRenderer;
import com.lowagie.text.DocumentException;

public class FirstDoc {

    public static void main(String[] args) throws IOException, DocumentException {
        long start = System.currentTimeMillis();
        String inputFile = "firstdoc.xhtml";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "firstdoc.pdf";
        OutputStream os = new FileOutputStream(outputFile);

        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);

        os.close();
        long end = System.currentTimeMillis();
        System.out.println("Ran in " + (end-start) + "ms");
    }
}

Now this library isn't exactly speedy, but it doesn't seem to be taking 1-2 minutes either. So now we need to figure out why it's running so slowly for you. Could you please let us know which JDK your using and on what platform? Also which version of flyingsaucer are you using?

bconneen
  • 146
  • 2
  • 12
  • I'm using the same libraries and jdk 1.5. – Edd Mar 25 '11 at 12:41
  • I've just run your code and it runs absolutely fine... I'm now very confused as my code is running well also. Thanks very much and sorry if I've wasted your time – Edd Mar 25 '11 at 12:42
  • No problem. Not a waste, since I've never tried flyingsaucer and it gave me a good excuse to give it a try. I would recommend you experiment with JDK 6, to see if there are performance gains. Also if you start working with bigger documents, things might slow down. If they do, you might want to consider trying different Dom parsers. – bconneen Mar 25 '11 at 19:12
  • 1
    Although this isn't an issue anymore, I add my comment in case somebody else experienced the same. I tried using flying saucer in the exact same manner as you did here except that the HTML file that I had to parse is a lot larger took me in the range of 118000 ms for the entire thing to run. When I timed the various sections of the code, I found that the delay came from parsing the HTML before rendering and generating the PDF. Using code from Mister Henson's comments below did the trick. `builder.setEntityResolver(FSEntityResolver.instance())` Parsing time dropped from 118000 ms to 190ms! – spydadome Nov 14 '12 at 08:45
1

We were facing massive performance problems as well. Generating the first PDF took almost a minute. If another generation was triggered while the first one still ran, they would finish almost simultaneously. Once the first PDF was generated, subsequent requests performed much faster.

After some profiling I found the bottleneck being the ITextFontResolver that is instantiated when ITextRenderer is initialized. This was due to the resolver loading all fonts it needed from com.lowagie.text.pdf.BaseFont which caused the huge delay. BaseFont caches its generated fonts which explained the simultaneous finishing of parallel requests and the speeding up of subsequent ones.

Our solution is to load the needed fonts upon initialization of the application. This increases start up time slightly (but not by a minute because it seems to be executed parallel to other initialization stuff) but allows the first PDF to be generated just as fast as any other. To trigger the font loading we just initialized an instance of ITextFontResolver. Using spring the solution is as simple as this:

@Component
public class FontLoader  implements InitializingBean {

    @Override
    public void afterPropertiesSet() {
        ITextFontResolver fontResolver = new ITextFontResolver(null);
    }
}
jBuchholz
  • 1,742
  • 2
  • 17
  • 25
  • I had a similar issue. The problem was that the XHTML document to be rendered had a reference to a image that actually didn't exist. That caused ITextRenderer to look for it for a painful amount of time. I worked it around using my own ITextUserAgent and setting it in the shared context. The workaround to do this is here: https://stackoverflow.com/questions/4782876/resolving-protected-resources-with-flying-saucer-itextrenderer – Nextor Oct 25 '22 at 12:32