Puppeteer Generate PDF from multiple HTML strings

Question

I am using Puppeteer to generate PDF files from HTML strings. Reading the documentation, I found two ways of generating the PDF files:

First, passing an url and call the goto method as follows:

page.goto('https://example.com');
page.pdf({format: 'A4'});

The second one, which is my case, calling the method setContent as follows:

page.setContent('<p>Hello, world!</p>');
page.pdf({format: 'A4'});

The thing is that I have 3 different HTML strings that are sent from the client and I want to generate a single PDF file with 3 pages (in case I have 3 HTML strings).

I wonder if there exists a way of doing this with Puppeteer? I accept other suggestions, but I need to use chrome-headless.

I would basically approach this as: 1.) puppeteer script that does THREE separate page.goto's 2.) a variable to hold each of the 3 scraped HTML strings from those 3 HTML pages 3.) at the end generate 3 separate PDF files I'm not sure you can merge PDF documents with puppeteer. If you find a way to do it please post your solution here. — tamak, Jan 31 '18 at 04:51

Juan Rivillas · Accepted Answer · 2018-04-17T12:54:47.373

I was able to do this by doing the following:

Generate 3 different PDFs with puppeteer. You have the option of saving the file locally or to store it in a variable.
I saved the files locally, because all the PDF Merge plugins that I found only accept URLs and they don't accept buffers for instance. After generating synchronously the PDFs locally, I merged them using PDF Easy Merge.

The code is like this:

const page1 = '<h1>HTML from page1</h1>';
const page2 = '<h1>HTML from page2</h1>';
const page3 = '<h1>HTML from page3</h1>';

const browser = await puppeteer.launch();
const tab = await browser.newPage();
await tab.setContent(page1);
await tab.pdf({ path: './page1.pdf' });

await tab.setContent(page2); 
await tab.pdf({ path: './page2.pdf' });

await tab.setContent(page3);
await tab.pdf({ path: './page3.pdf' });

await browser.close();

pdfMerge([
  './page1.pdf',
  './page2.pdf',
  './page3.pdf',
],
path.join(__dirname, `./mergedFile.pdf`), async (err) => {
  if (err) return console.log(err);
  console.log('Successfully merged!');
})

What is page1, page2 and page3 is it three different url or browser.newPage() ? also method name change setContent to content as per latest document.. — Haresh Chhelana, Apr 17 '18 at 05:18
page1, page2, and page3 contains the HTML of three different pages. They are strings. Thanks for the tip on `content`. I'll update it — Juan Rivillas, Apr 17 '18 at 10:23
any open source library for pdf merging. Hence I could see easy-pdf merge is not an open source — Dharmarajan, Jan 15 '21 at 05:48

Haresh Chhelana · Answer 2 · 2018-05-28T06:14:16.487

I was able to generate multiple PDF from multiple URLs from below code:

package.json

{
 ............
 ............

 "dependencies": {
    "puppeteer": "^1.1.1",
    "easy-pdf-merge": "0.1.3"
 }

 ..............
 ..............
}

index.js

const puppeteer = require('puppeteer');
const merge = require('easy-pdf-merge');

var pdfUrls = ["http://www.google.com","http://www.yahoo.com"];

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  var pdfFiles=[];

  for(var i=0; i<pdfUrls.length; i++){
    await page.goto(pdfUrls[i], {waitUntil: 'networkidle2'});
    var pdfFileName =  'sample'+(i+1)+'.pdf';
    pdfFiles.push(pdfFileName);
    await page.pdf({path: pdfFileName, format: 'A4'});
  }

  await browser.close();

  await mergeMultiplePDF(pdfFiles);
})();

const mergeMultiplePDF = (pdfFiles) => {
    return new Promise((resolve, reject) => {
        merge(pdfFiles,'samplefinal.pdf',function(err){

            if(err){
                console.log(err);
                reject(err)
            }

            console.log('Success');
            resolve()
        });
    });
};

RUN Command: node index.js

any open source library for pdf merging. Hence I could see easy-pdf merge is not an open-source — Dharmarajan, Jan 15 '21 at 05:48

ggorlen · Answer 3 · 2023-03-05T02:34:19.223

4

pdf-merger-js is another option. page.setContent should work just the same as a drop-in replacement for page.goto below:

const PDFMerger = require("pdf-merger-js"); // ^4.2.1
const puppeteer = require("puppeteer"); // ^19.7.2

const urls = [
  "https://news.ycombinator.com",
  "https://www.example.com",
  "https://en.wikipedia.org",
  // ...
];
const filename = "merged.pdf";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const merger = new PDFMerger();

  for (const url of urls) {
    await page.goto(url);
    await merger.add(await page.pdf());
  }

  await merger.save(filename);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

edited Mar 05 '23 at 02:34

answered Jun 10 '21 at 03:39

ggorlen

44,755
7
76
106

it also worked for me with this plugin. I think it's the only one still being maintained – Ruben Aug 28 '21 at 17:01
2

This also has a `.saveAsBuffer()` option, for returning a PDF as a stream, rather than saving it to your harddrive. – Itinerati Sep 02 '21 at 11:31

Puppeteer Generate PDF from multiple HTML strings

3 Answers3

Linked