1

EDIT for Mission Clarity: In the end I am pulling inventory data and customer data from Postgres to render and send a bunch of PDFs to customers, once per month. These PDFs are dynamic in that the cover page will have varying customer name/address. The next page(s) are also dynamic as they are lists of a particular customer's expiring inventory with item/expirying date/serial number.

I had made a client-side React page with print CSS to render some print-layout letters that could be printed off/saved as a pretty PDF.

Then, the waterfall spec came in that this was to be an automated process on the server. Basically, the PDF needs attached to an email alerting customers of expiring product (in med industry where everything needs audited).

I thought using Puppeteer would be a nice and easy switch. Just add a route that processes all customers, looking up whatever may be expiring, and then passing that into the dynamic react page to be rendered headless to a PDF file (and eventually finish the whole rest of the plan, sending email, etc.). Right now I just grab 10 customers and their expiring stock for PoC, then I have basically: { customer: {}, expiring: [] }.

I've attempted using POST to page with interrupt, but I guess that makes sense that I cannot get post data in the browser. So, I switched my approach to using cookies. This I would expect to work, but I can never read the cookie(s) into the page.

Here is a: Simple route, simple puppeteer which writes out cookies to a json and takes a screenshot just for proof, and simple HTML with script I'm using just to try to prove I can pass data along.

server/index.js:

app.get('/testing', async (req, res) => {
    console.log('GET /testing');
    res.sendFile(path.join(__dirname, 'scratch.html'));
});

scratch.js (run at commandline node ./scratch.js:

const puppeteer = require('puppeteer')
const fs = require('fs');
const myCookies = [{name: 'customer', value: 'Frank'}, {name: 'expiring', value: JSON.stringify([{a: 1, b: 'three'}])}];

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  await page.goto('http://localhost:1234/testing', { waitUntil: 'networkidle2' });
  await page.setCookie(...myCookies);

  const cookies = await page.cookies();
  const cookieJson = JSON.stringify(cookies);

  // Writes expected cookies to file for sanity check.
  fs.writeFileSync('scratch_cookies.json', cookieJson);
  
  // FIXME: Cookies never get appended to page.
  await page.screenshot({path: 'scratch_shot.png'});
  await browser.close();
})();

server/scratch.html:

<html>
    <body>
    </body>
    <script type='text/javascript'>
        document.write('Cookie: ' + document.cookie);
    </script>
</html>

The result is just a PNG with the word "Cookie:" on it. Any insight appreciated!

This is the actual route I'm using where makeExpiryLetter is utilizing puppeteer, but I can't seem to get it to actually read the customer and rows data.

app.get('/create-expiry-letter', async (req, res) => {
    // Create PDF file using puppeteer to render React page w/ data.
    // Store in Db.
    // Email file.
    // Send final count of letters sent back for notification in GUI.
    const cc = await dbo.getConsignmentCustomers();
    const result = await Promise.all(cc.rows.map(async x => {
        // Get 0-60 day consignments by customer_id;
        const { rows } = await dbo.getExpiry0to60(x.customer_id);
        if (rows && rows.length > 0) {
            const epiryLetter = await makeExpiryLetter(x, rows); // Uses puppeteer.
            // TODO: Store in Db / Email file.
            return true;
        } else {
            return false;
        }
    }));
    res.json({ emails_sent: result.filter(x => x === true).length });
});

Thanks to the samples from @ggorlen I've made huge headway in using cookies. In my inline script of expiry.html I'm grabbing the values by wrapping my render function in function main () and adding onload to body tag <body onload='main()'. Inside the main function we can grab the values I needed:

const customer = JSON.parse(document.cookie.split('; ').find(row => row.startsWith('customer')).split('=')[1]);
const expiring = JSON.parse(document.cookie.split('; ').find(row => row.startsWith('expiring')).split('=')[1]);

FINALLY (and yes, of course this will all be used in an automated worker in the end) I can get my beautifully rendered PDF like so:

(async () => {
  const browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setCookie(...myCookies);
  await page.goto('http://localhost:1234/testing');
  await page.pdf({ path: `scratch-expiry-letter.pdf`, format: 'letter' });
  await browser.close();
})();
Neil Gaetano Lindberg
  • 2,488
  • 26
  • 23
  • 1
    Cookies seem like a very roundabout way to get data onto a page with Puppeteer. Even after you set the cookies on the page, you'd still have to read them in the app. Since the React page at localhost is under your full control, why not provide a separate server route or url GET params to pass the data? After you get this sorted out, I'd do all of this in a worker task in any case, launching a browser on every request should block up the works pretty good. – ggorlen May 25 '21 at 18:39
  • @ggorlen I need the browser to render the HTML/CSS into a PDF though. – Neil Gaetano Lindberg May 25 '21 at 18:59
  • Why would you need cookies for that? Also, this is an honest question, not rhetorical: is using Puppeteer to screenshot React pages a common way to generate PDFs these days? Isn't there a more direct method or library that doesn't require a whole web UI framework and heavy/slow browser automation to programmatically generate a PDF? (I don't work with PDFs much, just Puppeteer and React, and I'm curious what the state of the art for PDF generation is) – ggorlen May 25 '21 at 19:02
  • I hear you. This all came about because I spent a couple days making this all client-side with client-side tech. I was then told this is to be all automated. I didn't want to throw my work away. I do have a worker in my more robust code, but yeah, this is an attempt to use varying data sets on the same client-side stack, but on the server. Not saying optimal, but I do think it is cool. I'm switching to get params in my route. I still don't get why one couldn't setup state with cookies though. Like if you're testing and you want to pass various dynamic components to test how they render... – Neil Gaetano Lindberg May 25 '21 at 19:12
  • 1
    I never said you can't use cookies, just that it makes no sense to me. I'm less concerned about a minor performance difference ("optimal") and more concerned about "total performance trainwreck". Putting your browser launch in the route callback is like launching a space shuttle on every request -- ultra-heavy. BTW, in your original code, `document.write('Cookie: ' + document.cookie);` runs before the cookies are set by Puppeteer. Make that a function that you can invoke using Puppeteer and see if it sees the cookie then, or try setting the cookies before navigating to the page. – ggorlen May 25 '21 at 19:14

1 Answers1

1

The problem is here:

await page.goto('http://localhost:1234/testing', { waitUntil: 'networkidle2' });
await page.setCookie(...myCookies);

The first line says, go to the page. Going to a page involves parsing the HTML and executing scripts, including your document.write('Cookie: ' + document.cookie); line in scratch.html, at which time there are no cookies on the page (assuming a clear browser cache).

After the page is loaded, await page.goto... returns and the line await page.setCookie(...myCookies); runs. This correctly sets your cookies and the remaining lines execute. const cookies = await page.cookies(); runs and pulls the newly-set cookies out and you write them to disk. await page.screenshot({path: 'scratch_shot.png'}); runs, taking a shot of the page without the DOM updated with the new cookies that were set after the initial document.write call.

You can fix this problem by turning your JS on the scratch.html page into a function that can be called after page load and cookies are set, or injecting such a function dynamically with Puppeteer using evaluate:

const puppeteer = require('puppeteer');

const myCookies = [
  {name: 'customer', value: 'Frank'}, 
  {name: 'expiring', value: JSON.stringify([{a: 1, b: 'three'}])}
];

(async () => {
  const browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.goto('http://localhost:1234/testing');
  await page.setCookie(...myCookies);

  // now that the cookies are ready, we can write to the document
  await page.evaluate(() => document.write('Cookie' + document.cookie));

  await page.screenshot({path: 'scratch_shot.png'});
  await browser.close();
})();

A more general approach is to set the cookies before navigation. This way, the cookies will already exist when any scripts that might use them run.

const puppeteer = require('puppeteer');

const myCookies = [
  {
    name: 'expiring',
    value: '[{"a":1,"b":"three"}]',
    domain: 'localhost',
    path: '/',
    expires: -1,
    size: 29,
    httpOnly: false,
    secure: false,
    session: true,
    sameParty: false,
    sourceScheme: 'NonSecure',
    sourcePort: 80
  },
  {
    name: 'customer',
    value: 'Frank',
    domain: 'localhost',
    path: '/',
    expires: -1,
    size: 13,
    httpOnly: false,
    secure: false,
    session: true,
    sameParty: false,
    sourceScheme: 'NonSecure',
    sourcePort: 80
  }
];

(async () => {
  const browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setCookie(...myCookies);
  await page.goto('http://localhost:1234/testing');
  await page.screenshot({path: 'scratch_shot.png'});
  await browser.close();
})();

That said, I'm not sure if cookies are the easiest or best way to do what you're trying to do. Since you're serving HTML, you could pass the data along with it statically, expose a separate API route to collect a customer's data which the front end can use, or pass GET parameters, depending on the nature of the data and what you're ultimately trying to accomplish.

You could even have a file upload form on the React app, then have Puppeteer upload the JSON data into the app programmatically through that form.

In fact, if your final goal is to dynamically generate a PDF, using React and Puppeteer might be overkill, but I'm not sure I have a better solution to offer without some research and additional context about your use case.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • I went with GET query parameters, which brought me also to es6renderer, but I'm hitting a wall with that now. I can't seem to destructure or otherwise assign arguments that are passed into the template engine for use in the page. I'm going to update my mission statement for more clarity. – Neil Gaetano Lindberg May 26 '21 at 15:47
  • 1
    That seems like a new issue. Can you open a separate question? Otherwise, the goalpost moves on this thread and my answer potentially becomes irrelevant -- other folks in the future might have this same problem. Feel free to @ me when you create it. If this answer solves the problem, you might consider [accepting it](https://stackoverflow.com/help/someone-answers) – ggorlen May 26 '21 at 15:58
  • 1
    I'll note for others: The second example has cookies with many properties, but: name, value, and domain, are definitely required. With just name and value puppeteer throws. `Error: Protocol error (Network.deleteCookies): At least one of the url and domain needs to be specified` – Neil Gaetano Lindberg May 26 '21 at 17:28