2

Question

How do I expose an object with a bunch of methods to puppeteer? I am trying to retain the definition of the parent object and method (i.e. foo.one) within page.evaluate, if possible. In other words, I am looking for console.log(foo.one('world')), typed as such, to return world.

Background

foo is a library container which returns a whole bunch of (relatively) pure functions. These functions are required both in the main script context AND within the puppeteer browser. I would prefer to not have to redefine each of them within page.evaluate and instead pass this entire "package" to page.evaluate for repository readability/maintenance. Nonetheless, as one answer suggests below, iterating over the methods from foo and exposing them individually to puppeteer with a different name isn't a terrible option. It just would require redefinitions within page.evaluate which I am trying to avoid.

Expected vs Actual

Let's assume an immediately invoked function which returns an object with a series of function definitions as properties. When trying to pass this IIFE (or object) to puppeteer page, I receive the following error:

import puppeteer from 'puppeteer'

const foo = (()=>{
    const one = (msg) => console.log('1) ' + msg)
    const two = (msg) => console.log('2) ' + msg)
    const three = (msg) => console.log('3) ' + msg)
    return {one, two, three}
})()

const browser = await puppeteer.launch().catch(err => `Browser not launched properly: ${err}`)
const page = await browser.newPage()
page.on('console', (msg) => console.log('PUPPETEER:', msg._text)); // Pipe puppeteer console to local console

await page.evaluate((foo)=>{
    console.log('hello')
    console.log(foo.one('world'))
},foo)

browser.close()

// Error: Evaluation failed: TypeError: foo.one is not a function

When I try to use page.exposeFunction I receive an error. This is to be expected because foo is an object.

page.exposeFunction('foo',foo)

// Error: Failed to add page binding with name foo: [object Object] is not a function or a module with a default export.

The control case, defining the function within the browser page, works as expected:

import puppeteer from 'puppeteer'

const browser = await puppeteer.launch().catch(err => `Browser not launched properly: ${err}`)
const page = await browser.newPage()
page.on('console', (msg) => console.log('PUPPETEER:', msg._text)); // Pipe puppeteer console to local console

await page.evaluate(()=>{
    const bar = (()=>{
        const one = (msg) => console.log('1) ' + msg)
        const two = (msg) => console.log('2) ' + msg)
        const three = (msg) => console.log('3) ' + msg)
        return {one, two, three}
    })()
    console.log('hello')
    console.log(bar.one('world'))
})
browser.close()

// PUPPETEER: hello
// PUPPETEER: 1) world

Update (5/19/2022)

Adding a quick update after testing the below solutions given my use case

Reminder: I am trying to pass an externally defined utilities.js library to the browser so that it can conditionally interact with page data and navigate accordingly.

I'm open to any ideas or feedback!

addScriptTag()

Unfortunately, passing a node.js module of utility functions is very difficult in my situation. When the module contains export statements or objects, addScriptTag() fails.

I get Error: Evaluation failed: ReferenceError: {x} is not defined in this case. I created an intermediary function to remove the export statements. That is messy but it seemed to work. However, some of my functions are IIFE which return an object with methods. And objects are proving very hard to work with via addScriptTag(), to say the least.

redundant code

I think for smaller projects the simplest and best option is to just re-declare the objects/functions in the puppeteer context. I hate redefining things but it works as expected.

import()

As @ggorlen suggests, I was able to host the utilities function on another server. This can be sourced by both the node.js and puppeteer environments. I still had to import the library twice: once in the node.js environment and once in the browser context. But it's probably better in my case than redeclaring dozens of functions and objects.

jmtornetta
  • 85
  • 6
  • Similar to the ramda functions but customized. They will perform logical tests, temporal operations, calculations and more from the data that puppeteer aggregates. I don't own the web assets that I am using puppeteer on so I think I have to use Node. I suppose I could `import` the object again, from within the puppeteer context. Or write the script tag to the page. Assuming I put them all up on GitHub and source from there. Is that what you suggest? – jmtornetta May 07 '22 at 23:23
  • Is the data they're operating on in the browser or in Puppeteer (and for some reason needs to be trigged by the console)? The script tag could be purely local, if they're just project-specific. I can add an answer but I still feel like I'm guessing your use case a bit and the details (probably?) matter. `page.exposeFunction` is for when you want to trigger a Node function from the browser, which is sort of a different use case than if you just want the code to run purely in the browser and operate on the data there. Typically, though, data is passed back to Node for most processing eventually. – ggorlen May 07 '22 at 23:27
  • 1
    Yeah the data is in the puppeteer browser. So I am needing these functions to test conditions to determine what data should be 'pulled' from the browser. E.G. puppeteer find some data and then I use these functions to test if the data has a certain date, null values, etc. This in turn determines if/how page navigation should occur. And that determines what data will ultimately be returned from puppeteer to the node server. – jmtornetta May 07 '22 at 23:37
  • Some of these functions are used by the main program (outside of puppeteer) too, since they are utility functions. Which is why I was thinking to have them defined in both places. But I think I see what you are getting at. Put what puppeteer needs in puppeteer and what the main program needs in that scope. – jmtornetta May 07 '22 at 23:44

2 Answers2

2

It might be repetitive when calling, but you could iterate over the object and call page.exposeFunction for each.

page.exposeFunction('fooOne', foo.one);
// ...

or

for (const [fnName, fn] of Object.entries(foo)) {
  page.exposeFunction(fnName, fn);
}

If the functions can all be executed in the context of the page, simply defining them inside a page.evaluate would work too.

page.evaluate(() => {
    window.foo = (()=>{
        const one = (msg) => console.log('1) ' + msg)
        const two = (msg) => console.log('2) ' + msg)
        const three = (msg) => console.log('3) ' + msg)
        return {one, two, three}
    })();
});

If you have to have only a single object containing the functions in the context of the page, you could first put an object on the window with page.evaluate, then in the main script, have an serial async loop over the keys and values of the object that:

  • calls page.exposeFunction('fnToMove' with the function
  • calls page.evaluate which assigns fnToMove to the desired property on the object created earlier

But that's somewhat convoluted. I wouldn't recommend it unless you really need it.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Like the idea, though I am trying to avoid editing the definitions within the `page.evaluate` statement (if even possible). Would there be a way of accomplishing this without renaming `foo.one` to `fooOne`? **While still retaining the parent object (i.e. `foo`)? – jmtornetta May 07 '22 at 22:31
  • Feel free to pass whatever name for the function you want. The second code block uses just the key names. – CertainPerformance May 07 '22 at 22:33
  • ^ I will update my question to include these limitations – jmtornetta May 07 '22 at 22:35
1

This is a bit speculative, because the use case matters quite a bit here. For example, exposeFunction means the code runs in Node context, so that involves inter-process communication and data serialization and deserialization, which seems inappropriate for your use case of processing the data fully in the browser. Then again, if there are Node-specific tasks like reading files or making cross-origin requests, it's appropriate.

If, on the other hand, you want to add code for the browser to call in the console context, a scalable way is to put your library into a script, then use page.addScriptTag("./your-lib.js") to attach it to the window. Either use a bundler to build the lib for browser compatibility or attach it by hand. Use module.exports if you also want to import it in Node.

For example:

foo.js

;(function () {
  var foo = {
    one: function () { return 1; },
    two: function () { return 2; },
    // ...
  };

  if (typeof module === "object" &&
      typeof module.exports === "object") {
    module.exports = foo;
  }

  if (typeof window === "object") {
    window.foo = foo;
  }
})();

foo-tester.js

const puppeteer = require("puppeteer"); // ^13.5.1
const foo = require("./foo"); // also use it in Node if you want...

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  await page.addScriptTag({path: "./foo.js"});
  console.log(foo.one()); // => 1
  console.log(await page.evaluate(() => foo.two())); // => 2
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

addScriptTag also works for modules and raw JS strings. For example, this works too:

await page.addScriptTag({content: `
  window.foo = {two() { return 2; }};
`});
console.log(await page.evaluate(() => foo.two())); // => 2

A hacky approach is to stringify the object of functions you may have in Node. I don't recommend this, but it's possible:

const foo = {
  one() { return 1; },
  two() { return 2; },
};
const fooToWindow = `window.foo = {
  ${Object.values(foo).map(fn => fn.toString())}
}`;
await page.addScriptTag({content: fooToWindow});
console.log(await page.evaluate(() => foo.two())); // => 2

See also Is there a way to use a class inside Puppeteer evaluate?

ggorlen
  • 44,755
  • 7
  • 76
  • 106