2

I'm building a web scraper, which has series of function calls passing around a puppeteer page object with the css selector string, to extract that html object and apply some arbitrary modification on it, provided as a function efn.

The "main" function receiving the page and doing page.evaluate:

const catElem = async ({page,selector,rwr=true,slug,efn} ={}) =>{              //efn is evaluate function bound using expose function

    if(efn){
        console.log("fn function added!");
        await page.exposeFunction("fn", x => `great ${x}`);
        
    }    
    if(page){
        
        categoryElems = await page.evaluate(async () => {
            // elem = document.querySelector(selector);
            
            if(!fn)
                return 5; //elem;
            
            return await window.fn(5);           // passed in function should return what the caller requires    #THREAT
        });
        
        if(!rwr)
            return categoryElems;
        return getShortCategories({slug: slug,pc: categoryElems});
    }
    else
        return getShortCategories({slug: slug});
      
};

This a simpler version of the function i'm using .. it receives a function efn to be bound using page.exposeFunction, like await page.exposeFunction("efn", efn) but here i've hard coded the function as x => "great ${x}" . As shown in here : How to pass a function in Puppeteers .evaluate() method?

I have another function calling this like let categoryElems = await catElem({page:page,selector:selector,efn:efn,rwr:false});

Although the catElem function returns great 5, it still produces this long ugly error, and is failing on the main thing i wanted to achieve i.e. executing the passed in function efn.

node:13021) UnhandledPromiseRejectionWarning: Error: Execution context is not available in detached frame "about:blank" (are you trying to evaluate?)
    at DOMWorld.executionContext (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:86:19)
    at DOMWorld._onBindingCalled (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:372:36)
    at /home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:55:66
    at /home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:62
    at Array.map (<anonymous>)
    at Object.emit (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:43)
    at CDPSession.emit (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/EventEmitter.js:72:22)
    at CDPSession._onMessage (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:245:18)
    at Connection._onMessage (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:117:25)
    at WebSocket.<anonymous> (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/puppeteer/lib/cjs/puppeteer/node/NodeWebSocketTransport.js:13:32)
    at WebSocket.onMessage (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/ws/lib/event-target.js:199:18)
    at WebSocket.emit (events.js:400:28)
    at Receiver.receiverOnMessage (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/ws/lib/websocket.js:1022:20)
    at Receiver.emit (events.js:400:28)
    at Receiver.dataMessage (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/ws/lib/receiver.js:522:14)
    at Receiver.getData (/home/faiz/CodeFiles/Python & Django/nwa_api/scraper/node_modules/ws/lib/receiver.js:440:17)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:13021) UnhandledPromiseRejectionWarning: Unhandled promise rejection. 
This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). 
(rejection id: 1)

(x17 ...till rejection id:17)

Any reason why this error is being produced?

EDIT: The function can be called as,

page = page.goto('https://www.foxnews.com/us/missouri-interstate-crash-five-dead');
selector = '.eyebrow > a:nth-child(1)';
efn = (x) => x.getAttribute('href');
catElem({page: page, selector:selector, rwr=false, efn=efn});
Faiz Ahmed
  • 97
  • 1
  • 6
  • 1
    This works fine for me, so in your attempt to create a [mcve] you may have missed important code or context. It's not clear to me that you need `exposeFunction`, which simply lets the console code invoke a function in Puppeteer/Node's context. If you just want to pass data to the window, that can be done more easily with parameters to `evaluate`, but I'm not sure what you're trying to do. – ggorlen Mar 20 '22 at 14:08
  • For instance, if i needed to do `document.querySelector('.headline').textContent.trim()` in `page.evaluate`, i would like the function stated in the OP do the querySelect call inside the browser, while .textContent.trim() would be provided in another function `efn` as `x=> x.textContent.trim()` to execute in node/puppeteer context as exposeFunction allows. Just as a side note, I have lots of calls to document.querySelector ... this was my attempt at just making the code DRY. – Faiz Ahmed Mar 20 '22 at 18:21
  • `.querySelector`, `.textContent` are purely DOM/browser functions/properties, not Node/Puppeteer functions, so you don't need to expose them, and in fact your code should fail if you try to call them inside Node using `exposeFunction`. If you want to add a function to the window, you can simply do `page.evaluate(() => {window.fn = () => whatever;})`, then use it whenever you need. – ggorlen Mar 20 '22 at 18:26
  • Also, I did dig around quite a bit ... seems like the error `Execution context is not avaliable` is being produced, after page.goto is called ... The page passed in, is been navigated to a url passed in as command line arg. The url that produced this error was [url](https://www.foxnews.com/us/missouri-interstate-crash-five-dead) – Faiz Ahmed Mar 20 '22 at 18:29
  • The error is being produced in this minimal example too. – Faiz Ahmed Mar 20 '22 at 18:40
  • The example isn't complete or runnable, unfortunately. Do you have a snippet I can copy, paste and run to reproduce the problem? – ggorlen Mar 20 '22 at 18:43
  • `page = page.goto('https://www.foxnews.com/us/missouri-interstate-crash-five-dead'); selector = '.eyebrow > a:nth-child(1)'; efn = (x) => x.getAttribute('href'); await catElem({page: page, selector:selector, rwr=false, efn: efn});` – Faiz Ahmed Mar 20 '22 at 18:55
  • 1
    I would suggest [edit]ing that into the post so it's a clean single script without ambiguity. But yeah, what's `getAttribute` in Node? Again, `exposeFunction` exposes a Node function, which isn't going to have access to browser stuff like `x.getAttribute("href")`. I think you have some fundamental misunderstandings that'd be easier to resolve if you provide a clear explanation of [what you're really trying to achieve](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/233676#233676) without presupposing that `exposeFunction` is the right way to achieve it. – ggorlen Mar 20 '22 at 19:07
  • Also i tried setting `window.fn=somefunc` in `page.evaluate`. In my case doing that it's not possible since it requires hardcoding the fn as done in this example... If i wanted to do `elem= document.querySelector('.some-class').textContent` and `elem=document.querySelector('.some-other-class').getAttribute('href')`, The function in the OP does the document.querySelect job, `efn` in the first case would be `x=>x.textContent` and in second case, `x=>x. getAttribute('href')`. `efn` would be called as `window.efn(elem)`. In short i want to pass a function to `page.evaluate`. – Faiz Ahmed Mar 20 '22 at 19:24
  • IF doing `someElement.textContent` or `someElement.someOtherThing()` is not possible in `exposeFunction`... This whole approch i'm taking is pointless and futile, I'll resort to a simple `page.evaluate` doing all the different `document.querySelects` for all the different elements I'm trying to scrape. Thank you for your help @ggorlen , and also for that meta post. – Faiz Ahmed Mar 20 '22 at 19:49
  • "i tried setting `window.fn=somefunc` in `page.evaluate`. In my case doing that it's not possible since it requires hardcoding the fn as done in this example". What do you mean by hardcoding? You can totally write wrappers on `querySelector`, `getAttribute` and `textContent`, but I'm not sure what these functions are supposed to look like or where/how you want to use them because the code you've provided feels stubby, incomplete and difficult to follow. "In short I want to pass a function to `page.evaluate`" -- what function? A Node or a browser console function? What does this function do? – ggorlen Mar 20 '22 at 20:04
  • There's an arg called `efn` in the OP function, I wanted this to be a function which would perform any HTML element operation not just `getAttribute` or `textContent`... ON `querySelector`. `document.querySelector`. would be the called regardless of what `efn` is. (See the commented code line in the OP)... I wanted this `efn` broswer function to be evaluated using the broswer's context. I thought exposeFunction would give the browser's context to node which would execute it in that context. Now i see that doesn't happen. – Faiz Ahmed Mar 20 '22 at 20:49
  • OK, that's a bit clearer. `exposeFunction` exposes Node to the browser, `evaluate` lets you attach a function to the window to run in browser context as I showed above. I think this is a dupe of [Is there a way to add script to add new functions in evaluate() context of chrome+puppeeter?](https://stackoverflow.com/questions/48476356/is-there-a-way-to-add-script-to-add-new-functions-in-evaluate-context-of-chrom) – ggorlen Mar 20 '22 at 21:07

0 Answers0