55

I'm using Headless Chrome to print out PDF files by using the printToPDF CDP method. If we set the displayHeaderFooter parameter to true, then we can set specific page header and footer by using the parameters headerTemplate and footerTemplate. The protocol provides some HTML classes to display some information, these are: date, title, URL, pageNumber, and totalPages.

For example, we can set footerTemplate to <span class="pageNumber"></span> to display the current page number in the footer. We also need to add some style to display it properly. The default header and footer settings can be found here, and the renderer C++ component is here.

I would like to modify the displayed pageNumber values. My goal is to count pages from a given number.

The Puppeteer API documentation note that headerTemplate and footerTemplate markup have the following limitations:

  1. Script tags inside templates are not evaluated.
  2. Page styles are not visible inside templates.

A GitHub comment provides the following:

<div style="font-size: 10px;">
  <div id="test">header test</div>
  <img src='http://www.chromium.org/_/rsrc/1438879449147/config/customLogo.gif?revision=3' onload='document.getElementById("test").style.color = "green";this.parentNode.removeChild(this);'/>
</div>

It says, if we use an onload attribute on an img tag, then we can run JavaScript in the templates. However, I was not able to reproduce the result, which is shown in the screenshot under the snippet.

For example, the following JavaScript could count pages from 10:

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="tmpimg" 
onload="var x = document.getElementById('pn').innerHTML; var y = 10; document.getElementById('pn').innerHTML = parseInt(x) + y; this.parentNode.removeChild(this);"/>
<span id="pn" class="pageNumber"></span>

But unfortunately, this script does not modify the page numbering, and I have no idea how to solve this problem. I've also tried to use pure CSS solutions, but without success.

Any ideas are welcome to resolve this issue.

Monish Khatri
  • 820
  • 1
  • 7
  • 24
user9179380
  • 591
  • 3
  • 5

4 Answers4

1

I tried straight forward approaches to solve this problem and they didn't work. Even obscure apis like CSS expressions and counters don't work to solve this. Fortunately there seems to be a simple enough work around.

We print each page separately using the pageRange parameter and then combine all the pages to generate the required pdf. This enables us to print each header/footer if it were a function of the pageNumber. For example:

const footerTemplate = function (pageNumber) {
    return `<div>Page number: ${pageNumber + 24}</div>`;
};

We need to iterate over each page and print it.

const printPage = function (pageNumber) {
    return {
        ...
        path: `html-page-${pageNumber}.pdf`,
        footerTemplate: footerTemplate(pageNumber),
        pageRanges: String(pageNumber)
    };
};


(async function () {
    ...
    const page = await browser.newPage();
    var pageNumber = 1;
    try {
        while (pageNumber > 0) {
            await page.pdf(printPage(pageNumber));
            pageNumber += 1;
        }
    } catch (e) {
    } finally {
       // Merge and clean up
    }
})();

There is no trivial way to determine the total number of pages to print. So we don't know when to stop. Fortunately, Chrome sends an error when we try to print a page that's out of range. So we can use that to stop our printing.

Attached below is a working sample with page numbers offset by 24. Run with dependencies: fs, pdf-merger-js and puppeteer.

const puppeteer = require("puppeteer");
const PDFMerger = require('pdf-merger-js');
const fs = require("fs");

const footerTemplate = function (pageNumber) {
    return `<div style="font-size: 10px; display: flex; flex-direction: row; justify-content: space-between; width: 100%" id='template'>
        <div>Page number: ${pageNumber + 24}</div>
    </div>`;
};

const mergePdfs = async function (totalPages, fileName) {
    var merger = new PDFMerger();
    for (var pageNumber = 1; pageNumber < totalPages; pageNumber++) {
        await merger.add(`html-page-${pageNumber}.pdf`);
    }
    await merger.save(fileName);
};

const cleanup = function (totalPages) {
    for (var pageNumber = 1; pageNumber < totalPages; pageNumber++) {
        var path = `html-page-${pageNumber}.pdf`
        fs.rmSync(path);
    }
};

const pageSetup = function (pageNumber) {
    return {
        path: `html-page-${pageNumber}.pdf`,
        format: 'Letter',
        printBackground: true,
        displayHeaderFooter: true,
        footerTemplate: footerTemplate(pageNumber),
        pageRanges: String(pageNumber),
        margin: {
            top: '1in',
            right: '0in',
            bottom: '1in',
            left: '0in'
        }
    };
};


(async function () {
    const browser = await puppeteer.launch({
        ignoreHTTPSErrors: true,
        dumpio: true,
        headless: true
    });
    const page = await browser.newPage();
    await page.goto('http://worrydream.com/KillMath/');
    var pageNumber = 1;
    try {
        while (pageNumber > 0) {
            await page.pdf(pageSetup(pageNumber));
            pageNumber += 1;
        }
    } catch (e) {
        await mergePdfs(pageNumber, 'html-page.pdf');
        cleanup(pageNumber);
    }
    await browser.close();
})();
Shlomo
  • 120
  • 2
  • 8
TheChetan
  • 4,440
  • 3
  • 32
  • 41
1

This shows how to print each page separately with a footer that displays the page number with an offset of 10, and then merge all the pages into a single PDF using pdf-merger-js like TheChetan said. This example assumes you have already installed fs, pdf-merger-js, and puppeteer as dependencies. The pdfPath variable should be set to the path where the merged PDF file will be saved:

const puppeteer = require('puppeteer');
const PDFMerger = require('pdf-merger-js');
const fs = require('fs');

const headerTemplate = '<div style="font-size: 10px; margin-left: 20px;">Header</div>';
const footerTemplate = function (pageNumber) {
  return `<div style="font-size: 10px; margin-left: 20px;">Page number: ${pageNumber + 10}</div>`;
};

const printPage = function (pageNumber) {
  return {
    path: `html-page-${pageNumber}.pdf`,
    displayHeaderFooter: true,
    headerTemplate: headerTemplate,
    footerTemplate: footerTemplate(pageNumber),
    pageRanges: String(pageNumber)
  };
};

(async function () {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  let pageNumber = 1;

  try {
    while (true) {
      await page.goto(`https://www.example.com/page=${pageNumber}`, { waitUntil: 'networkidle2' });
      const pdfBuffer = await page.pdf(printPage(pageNumber));

      fs.writeFileSync(printPage(pageNumber).path, pdfBuffer);

      pageNumber += 1;
    }
  } catch (e) {
    // When an out-of-range page is printed, the page.pdf method throws an error
  } finally {
    await browser.close();

    const merger = new PDFMerger();
    for (let i = 1; i < pageNumber; i++) {
      merger.add(`html-page-${i}.pdf`);
    }

    const pdfPath = 'output.pdf'; // Set the path to save the merged PDF
    await merger.save(pdfPath);
  }
})();

This code assumes a pattern for the page URLs where the page number can be changed in the query string (e.g. https://www.example.com/page=1, https://www.example.com/page=2, etc.). You will need to adjust the code accordingly to match the URL pattern of the website you're using.

I hope this helps!

1

To modify the first page number, you can use the @page CSS rule in the @media print media query. Here's an example:

@page {
  /* Set the initial page number */
  @top-center {
    content: counter(page);
  }
}

This will set the initial page number to 1. If you want to start from a different number, you can use the counter-reset property to set the page counter to a different value:

@page {
  /* Reset the page counter to 4 */
  @top-center {
    content: counter(page);
    counter-reset: page 3;
  }
}

To execute JavaScript in the header or footer template, you can use the {{#inHeader}} and {{#inFooter}} block helpers provided by puppeteer. Here's an example:

{{#inHeader}}
  <script>
    console.log('This is executed in the header');
  </script>
{{/inHeader}}

{{#inFooter}}
  <script>
    console.log('This is executed in the footer');
  </script>
{{/inFooter}}

This will execute the JavaScript code in the header or footer of each page.

Note that the {{#inHeader}} and {{#inFooter}} block helpers are only available in the header and footer templates, not in the main content template. If you need to execute JavaScript in the main content template, you can use the {{#custom}} block helper and pass a parameter to indicate where the code should be executed. For example:

{{#custom "header"}}
  <script>
    console.log('This is executed in the header');
  </script>
{{/custom}}

{{#custom "footer"}}
  <script>
    console.log('This is executed in the footer');
  </script>
{{/custom}}

Then, in your puppeteer code, you can define a handlebars function to register the custom block helper:

const handlebars = require('handlebars');

handlebars.registerHelper('custom', function(position, options) {
  if (position === 'header') {
    return options.fn({inHeader: true});
  } else if (position === 'footer') {
    return options.fn({inFooter: true});
  } else {
    return options.fn({});
  }
});

This will allow you to execute JavaScript code in the header, footer, or main content template.

code_love
  • 128
  • 6
  • Are you able to provide any links to where `inHeader` or `inFooter` is documented in Puppeteer? I can't find any references to it so I suspect this is custom to your code. – James Hulse Jun 14 '23 at 02:47
0

Here's the JavaScript code to modify the first page number or execute JS in header or footer template with Chrome DevTools Protocol's printToPDF:

const CDP = require('chrome-remote-interface');
const fs = require('fs');

CDP(async(client) => {
    const {Page, Runtime} = client;
    await Promise.all([Page.enable(), Runtime.enable()]);

    // Modify the first page number
    await Page.navigate({url: 'http://example.com'});
    await Page.loadEventFired();
    const pageNumber = await Runtime.evaluate({expression: 'document.querySelector(".page-number:first-of-type").innerText'});
    console.log(`First page number: ${pageNumber.value}`);
    await Runtime.evaluate({expression: 'document.querySelector(".page-number:first-of-type").innerText = "1"'});
    const modifiedPageNumber = await Runtime.evaluate({expression: 'document.querySelector(".page-number:first-of-type").innerText'});
    console.log(`Modified first page number: ${modifiedPageNumber.value}`);

    // Execute JS in header or footer template
    const headerTemplate = `
        <style>@page {size:A4 portrait;margin:.5cm .5cm 5cm .5cm;border:1px solid #bbb;}</style>
        <div style="font-size:14px;padding:5px;width:100%;text-align:center;border-bottom:1px solid #bbb;">Example Header</div>
    `;
    const footerTemplate = `
        <style>@page {size:A4 portrait;margin:.5cm .5cm 5cm .5cm;border:1px solid #bbb;}</style>
        <div style="font-size:14px;padding:5px;width:100%;text-align:center;border-top:1px solid #bbb;">Example Footer</div>
    `;
    const pdfOptions = {
        headerTemplate,
        footerTemplate,
        displayHeaderFooter: true,
        printBackground: true,
        margin: {
            top: '1cm',
            bottom: '1cm',
            left: '1cm',
            right: '1cm'
        }
    };
    const pdfData = await Page.printToPDF(pdfOptions);
    fs.writeFileSync('example.pdf', pdfData.data, 'base64');

    await client.close();
}).on('error', (err) => {
    console.error(err);
});