Open Puppeteer with specific configuration (download PDF instead of PDF viewer)

Question

I would like to open Chromium with a specific configuration.

I am looking for the configuration to activate the following option :

Settings => Site Settings => Permissions => PDF documents => "Download PDF files instead of automatically openning them in Chrome"

I searched the tags on this command line switch page but the only parameter that deals with pdf is --print-to-pdf which does not correspond to my need.

Do you have any ideas?

score 9 · Answer 1 · answered Aug 03 '20 at 15:43

There is no option you can pass into Puppeteer to force PDF downloads. However, you can use chrome-devtools-protocol to add a content-disposition: attachment response header to force downloads.

A visual flow of what you need to do:

cdp-modify-response-header (2)

I'll include a full example code below. In the example below, PDF files and XML files will be downloaded in headful mode.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null, 
  });

  const page = await browser.newPage();

  const client = await page.target().createCDPSession();

  await client.send('Fetch.enable', {
    patterns: [
      {
        urlPattern: '*',
        requestStage: 'Response',
      },
    ],
  });

  await client.on('Fetch.requestPaused', async (reqEvent) => {
    const { requestId } = reqEvent;

    let responseHeaders = reqEvent.responseHeaders || [];
    let contentType = '';

    for (let elements of responseHeaders) {
      if (elements.name.toLowerCase() === 'content-type') {
        contentType = elements.value;
      }
    }

    if (contentType.endsWith('pdf') || contentType.endsWith('xml')) {

      responseHeaders.push({
        name: 'content-disposition',
        value: 'attachment',
      });

      const responseObj = await client.send('Fetch.getResponseBody', {
        requestId,
      });

      await client.send('Fetch.fulfillRequest', {
        requestId,
        responseCode: 200,
        responseHeaders,
        body: responseObj.body,
      });
    } else {
      await client.send('Fetch.continueRequest', { requestId });
    }
  });

  await page.goto('https://pdf-xml-download-test.vercel.app/');

  await page.waitFor(100000);

  await client.send('Fetch.disable');

  await browser.close();
})();

For a more detailed explanation, please refer to the Git repo I've setup with comments. It also includes an example code for playwright.

I've tried many solutions and only this work in my case, except `await page.goto` throws net::ERR_ABORTED error. By catching the error and just ignore it, and then verify the file in local path to confirm download succeed or not solves the issue. — imckl, Nov 06 '20 at 02:10

score 2 · Answer 2 · answered May 22 '19 at 17:17

Puppeteer currently does not support navigating (or downloading) PDFs in headless mode that easily. Quote from the docs for the page.goto function:

NOTE Headless mode doesn't support navigation to a PDF document. See the upstream issue.

What you can do though, is detect if the browser is navigating to the PDF file and then download it yourself via Node.js.

Code sample

const puppeteer = require('puppeteer');
const http = require('http');
const fs = require('fs');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    page.on('request', req => {
        if (req.url() === '...') {
            const file = fs.createWriteStream('./file.pdf');
            http.get(req.url(), response => response.pipe(file));
        }
    });

    await page.goto('...');
    await browser.close();
})();

This navigates to a URL and monitors the ongoing requests. If the "matched request" is found, Node.js will manually download the file via http.get and pipe it into file.pdf. Please be aware that this is a minimal working example. You want to catch errors when downloading and might also want to use something more sophisticated then http.get depending on the situation.

Future note

In the future, there might be an easier way to do it. When puppeteer will support response interception, you will be able to simply force the browser to download a document, but right now this is not supported (May 2019).

Any chance you could revisit this if there is anything new to make it easier! — Trentfrompunchbowl1, Mar 22 '22 at 07:31

Open Puppeteer with specific configuration (download PDF instead of PDF viewer)

2 Answers2

Future note

Linked