2

In our project, we have a task to generate PDF from HTML content. For that, we tried to use flying saucer and openhtmltoppdf, however, the HTML content that we are trying to generate contains CSS3 syntax, and seems that both of these libraries have poor support for CSS3. As a result generated PDF is incomplete and missing proper layout. My question is, if there is any way to generate a proper PDF that will look the same as in the web view in Java?

Here is the code snippet:

var document = Jsoup.connect(url).get();

try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
    PdfRendererBuilder builder = new PdfRendererBuilder();
    builder.withUri(uri);
    builder.toStream(outputStream);
    builder.withW3cDocument(new W3CDom().fromJsoup(document), "/");
    builder.run();

    return outputStream.toByteArray();
}

We were also trying to append all the CSS code to the HTML document, since the original HTML document contains external references to the static CSS pages. Here is the snippet:

for (Element link : document.select("link[rel=stylesheet]")) {
    String cssFilename = link.attr("href");

    Element style = new Element(Tag.valueOf("style"), "");
    var css = Jsoup.connect(baseUrl + cssFilename).get().body().text();

    style.appendText(css);
    link.replaceWith(style);
}
Dmitriy Popov
  • 2,150
  • 3
  • 25
  • 34
  • You have 3 options: [1] Use a java-based PDF generator, such as iText. This is problematic; iText isn't FOSS, other alternatives aren't any good. You'd have to ditch the HTML in any case. [2] Let a browser engine render the HTML to PDF. This is problematic, too. [3] Use another ecosystem. Such as pdf-make.js, and run this e.g. using `ProcessBuilder`. Inconvenient, but it's what I recommend here. – rzwitserloot Jul 05 '23 at 10:00
  • The 3rd option would involve changing how your code works (you'd have to just generate the PDF using that library's constructs, and avoid HTML altogether). The problem is this: From experience, the chrome engine's print system is deplorably bad. That's because the chrome team doesn't care about printing. Bugs about misinterpreting print-specific CSS have been open for _years_. Firefox is much better, but, running firefox headless is very complicated. It requires a full GFX stack, which makes it _very_ heavyweight and hard to install on servers. – rzwitserloot Jul 05 '23 at 10:01

3 Answers3

1

We did a similar thing, but with python, we used wkhtmltopdf. It also had poor css support. But when we used inline css. The generated pdf followed proper formatting. You may try using inline css. Not sure if it will help. But it helped us.

1

You can to try Chrome Headless print

cmd: google-chrome --headless --disable-gpu --print-to-pdf='/root/test/test.pdf' /root/test/test.html

finger
  • 11
  • 2
1

You can try Spire.PDF for Java. It provides functionality for rendering HTML with inline CSS to PDF. If you're using external CSS, you'll need to convert it into inline CSS to ensure proper rendering within the PDF document.

Dheeraj Malik
  • 703
  • 1
  • 4
  • 8