13

TL;DR:

  1. Any suggestions in NodeJS to convert an HTML to PDF or PNG without any headless browser instances.
  2. Also anyone uses puppeteer in any production environment. I would like to know how the resource utilisations and performance of running headless browser in prod.

Longer version:

In a NodeJS server we need to convert an HTML string to a PDF or PNG based on the request params. We are using puppeteer to generate this PDF and PNG (screenshot) deployed in a google cloud function. In my local running this application in a docker and restricted memory usage to 100MB and this seems working. But in cloud function it throws memory limit exception when we set the cloud function to 250MB memory. For a temporary solution we upgraded the cloud function to 1 GB.

We would like to try any alternatives for puppeteer without any headless browser approach. Another library PDF-Kit looks good but it have canvas api kind of input. We can't directly feed html.

Any thoughts or input on this

Anand Prem
  • 397
  • 5
  • 15

3 Answers3

1

Any suggestions in NodeJS to convert an HTML to PDF or PNG without any headless browser instances.

Yes, you can try with jsPDF. I never used it before. The syntax is simple.
Under the hood it looks no headless browser libraries are used and it seems this is a 100% pure javascript implementation.
You can feed the library directly with and HTML string.
BUT there is no png option. For images anyway there are a lot of solution that could be combined with jsPDF (so, HTML to PDF to PNG) or also other HTML to PNG direct solutions. Take a look here.

Also anyone uses puppeteer in any production environment. I would like to know how the resource utilisations and performance of running headless browser in prod.

When you want use puppeteer, I suggest to split services: a simple http server that must just handle the HTTP communication with your clients and a separate puppeteer service. Both services must be scalable but, ofcourse, the second will require more resources to run. To optimize resorces, I suggest using puppeter-cluster to create a cluster of puppeteer workers. You can better handle errors, flow and concurrency and at the same time you can save memory by using a single istance of Chromium (with the CONCURRENCY_PAGE or CONCURRENCY_CONTEXT model)

radar155
  • 1,796
  • 2
  • 11
  • 28
  • Looks to be client-side, I'm more interested on the server-side – wiredmartian Oct 31 '22 at 10:28
  • that's not true! You can use the library both in browsers and in node.js based services. take a Better look to the documention.. – radar155 Oct 31 '22 at 10:34
  • 1
    https://github.com/parallax/jsPDF#running-in-nodejs – radar155 Oct 31 '22 at 10:36
  • Taking a closer look, jsPDF uses htmltocanvas, which clearly says in it's Readme that is meant to be used in the browser. Makes sense, imagine making a tool that can understand all HTML/CSS and remains up to date with spec. You'd be making a proto-browser. – Sergio Chumacero Apr 06 '23 at 22:01
  • **"You can feed the library directly with and HTML string."** [Node from node.js](https://github.com/parallax/jsPDF/issues/2970). That function only works in a browser. – Mud Jun 22 '23 at 21:11
0

If you can use Docker, then a great solution for you may be Gotenberg.

It's an incredible service that can convert a lot of formats (HTML, Markdown, Word, Excel, etc.) into PDF.

If your page render depends on JavaScript, then no problem, it will run it and wait (you can even configure the max wait time) for the page to be completely rendered to generate your PDF.

We are using it for an application that generates 3000 PDFs per day and never had any issue with it.

Demo:

Take a look at this sample HTML invoice: https://sparksuite.github.io/simple-html-invoice-template/

Now let's convert it to PDF:

enter image description here

Boom, done!

1: Gotenberg URL (here using a demo endpoint provided by Gotenberg team with some limitations like 2 requests per second per IP and 5MB body limit)

2: pass an url parameter with the URL of the webpage you want to convert

3: You get the PDF as the HTTP response with Content-Type application/pdf

Curl version:

curl --location --request POST 'https://demo.gotenberg.dev/forms/chromium/convert/url' \
--form 'url="https://sparksuite.github.io/simple-html-invoice-template/"' \
-o myfile.pdf

Node.JS version:

const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');

async function main() {
  const formData = new FormData();
  formData.append('url', 'https://sparksuite.github.io/simple-html-invoice-template/')
  const res = await fetch('https://demo.gotenberg.dev/forms/chromium/convert/url', {
    method: 'POST',
    body: formData
  })
  const pdfBuffer = await res.buffer()
  // You can do whatever you like with the pdfBuffer, such as writing it to the disk:
  fs.writeFileSync('/home/myfile.pdf', pdfBuffer);
}

main()

Using your own Docker instance instead of the demo endpoint, here is what you need to do:

1. Create the Gotenberg Docker container:

docker run -p 3333:3000 gotenberg/gotenberg:7 gotenberg

2. Call the http://localhost:3333/forms/chromium/convert/url endpoint:

enter image description here

Curl version:

curl --location --request POST 'http://localhost:3333/forms/chromium/convert/url' \ 
--form 'url="https://sparksuite.github.io/simple-html-invoice-template/"' \
-o myfile.pdf

Node.JS version:

const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');

async function main() {
  const formData = new FormData();
  formData.append('url', 'https://sparksuite.github.io/simple-html-invoice-template/')
  const res = await fetch('http://localhost:3333/forms/chromium/convert/url', {
    method: 'POST',
    body: formData
  })
  const pdfBuffer = await res.buffer()
  // You can do whatever you like with the pdfBuffer, such as writing it to the disk:
  fs.writeFileSync('/home/myfile.pdf', pdfBuffer);
}

main()

Gotenberg homepage: https://gotenberg.dev/

Vincent
  • 3,945
  • 3
  • 13
  • 25
  • Doesn't gotenberg use Chromium under the hood? The question asked for a non-headless browser solution – cdimitroulas Jun 16 '23 at 10:50
  • @cdimitroulas, you're right, it does use Chromium. I interpreted "I want to convert HTML to PDF without headless browser instance" to "Please suggest a solution where I don't need to write code to create/navigate a headless browser because it's a pain", so I suggested this approach that just works without the headless browser hassle, but my interpretation might be wrong, indeed. – Vincent Jun 16 '23 at 15:33
0

If you have access to command wkhtmltopdf, I recommended it.

We use with success in our production website to generate pdfs.

First generate file_name html file, then

wkhtmltopdf --encoding utf8 --disable-smart-shrinking --dpi 100 -s {paper_size} -O {orientation}  '{file_name}'
anaconda
  • 1,065
  • 10
  • 20
  • https://wkhtmltopdf.org/ is in fact headless WebKit under the hood what OP forbid, but maybe it could be less demanding than his current approach after all. – myf Nov 04 '22 at 10:35
  • I tried to refactor to not use `wkhtmltopdf`, but solutions found was to slow to generate pdf from html. The other options was to manually create those pdf using some server side pdf library. Not time to explore this solution. – anaconda Nov 04 '22 at 11:34