1

Background

We are using Puppeteer to render PDFs on a Node server. We are using an API to pass large query strings to the API which is passed to Puppeteer. Once Puppeteer renders the web page, the data in the GET query string is pulled into the HTML page rendered so the data in the page is populated dynamically. Once the page renders, Puppeteer converts it to a PDF and it is downloaded to the client.

Problem

We realized that when the requests are very large it breaks the browser when we hit the API with a GET request. To overcome this we are hitting the API as a POST and hashing the data so it can be rendered later.

This got us wondering if there is a max char for the puppeteer function rendering the web page used to render a PDF.

Example Code

const browser = await puppeteer.launch({
          args: ['--no-sandbox', '--disable-setuid-sandbox'],
          ignoreHTTPSErrors: true,
          dumpio: false
        });

        const page = await browser.newPage();

        const data = reqMethod === 'POST' ? req.body : JSON.parse(req.query.data);

        const {pdfOptions, ...templateData} = data;

        const url = `${PDF_API_PROD}/${template}?data=${JSON.stringify(templateData)}`;

        await page.goto(url);

        const pdfBuffer = await page.pdf({
          format: 'A4',
          margin: {
            top: '20px',
            left: '20px',
            right: '20px',
            bottom: '20px',
          },
          ...pdfOptions,
        });

Question

After looking at the code above you will see that we are passing the data object directly into the URL as a GET param. This will be used to render the web page with Puppeteer.

Once the web page is rendered with Puppeteer the data in the GET string will be pulled into the web page with JavaScript in order to render the page dynamically.

What is the max chars that can be passed into the Puppeteer function await page.goto(url);?

wuno
  • 9,547
  • 19
  • 96
  • 180

1 Answers1

0

There is no hard limit built into the browser. I was able to send URLs of a length of up to 2000000 characters to a server myself without any problems. Even after that, I only had trouble because it just takes some time to send the data.

If you are having trouble sending large ULRs, it is most likely one of the following two things:

1. The server is not properly configure to receive the amount of data.

To receive that much data, you have to properly configure your server. By default, most server will cap the data which can be send via the URL.

2. You are hitting a timeout

Keep in mind, that sending a few MB of data, might take some time depending on your internet connection and the server upload speed. It might also be slower to send the data in the head of the HTTP request instead of sending it as a stream inside the body. In my test cases, this was the limiting factor.

Therefore: Most likely, the problem you are encountering is not related to puppeteer but to the receiving end.

What puppeteer does

As you are thinking that puppeteer might truncate the URL: This is not the case. puppeteer is just a wrapper around the DevTools Protocol. Puppeteer will take the URL argument, wrap it as part of the payload via JSON.stringify and send it to the browser. I doubt that the DevTools Protocol has any limitations built into Page.navigate. Therefore, there should be no "library-specific" introduced though puppeteer here.

Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
  • I appreciate your answer. I can see that many resources throughout the internet say that the max chars you can send in a URL is 2083. IE for example has problems with more than that. Firefox can handle more. We know that the URL gets cut off when over a certain amount in some browsers and we found a work around for that when making a GET request. I am curious more specifically about Puppeteer and if the method used to load the page it renders has some type of max. – wuno May 21 '19 at 13:14
  • [This post](https://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers) has some more information on that topic. Puppeteer is just sending the URL to the browser via `JSON.stringify`. I added more information with links to the actual code in the puppeteer library to my answer. If you encounter any limitations, they will be built into the browser, not into puppeteer. – Thomas Dondorf May 21 '19 at 16:43
  • Thomas when `await client.send('Page.navigate', {url, referrer, frameId});` fires, is it actually making a GET request in Chromium to load a page somewhere behind the scenes and is identical to me typing a URL into Chrome and hitting enter? – wuno May 22 '19 at 00:52
  • @wuno I do not think it is identical to typing into the address bar, as the address bar might have some length restrictions (see the post I linked in the comment above). But it should be identical to typing `location.href = '...'` in the DevTools console. – Thomas Dondorf May 22 '19 at 05:45