1

UPDATE I am running in docker and using puppeteer version 1.11.0, since this is the latest version that is supported by alpine linux. I am also running with --no-sandbox

Just for the sake of code organization, I'd like to do this in puppeteer...

async function crawler(url, evaluater) {
    const browser = await puppeteer.launch(...)
    const page = await browser.newPage()
    await page.goto(url)
    const result = await page.evaluate(evaluater)
    return result

}

crawler('https://website.com', () => {
    return document.querySelectorAll(...)
})

But I get the following error....

Error: Evaluation failed: TypeError: Cannot read property 
'querySelectorAll' of undefined

I assume the evaluator function is actually passed to eval so I would expect the below to work in that case

const result = await page.evaluate(evaluater.toString())

This doesn't work either though. There is no error message, but undefined is returned. If I move the function inline, the data is returned.

Is there any way that I can provide a callback to page.evaluate that is not defined inline but passed in as a variable?

Charlie Martin
  • 8,208
  • 3
  • 35
  • 41
  • The code you have works fine for me; if I add `.then(r => console.log(r))` I can see the selected objects in the console. Passing a function like that is no problem at all. "doesn't work" is a useless problem description; please clarify *how* the code fails for you. –  May 15 '19 at 23:09
  • The error is `Error: Evaluation failed: TypeError: Cannot read property 'querySelectorAll' of undefined`. In other words, `document` is not defined within the callback. – Charlie Martin May 16 '19 at 02:33
  • I have code very similar to hoangdv's, and puppeteer v1.10.0. Using `document` inside the function passed to `page.evaluate` works perfectly fine. Please edit your question so the code reproduces your problem. –  May 16 '19 at 07:50
  • I tried out the same exact code in latest puppeteer on MacOS and it worked fine. Somehow running chrome on alpine linux in a docker container causes the issue ¯\_(ツ)_/¯ – Charlie Martin May 16 '19 at 10:57

1 Answers1

1

Your code looking fine, it is working in my environment. Your issue maybe comes from puppeteer version - Let remove node_modules and reinstall them,

Your website what you want to crawler was disabled crawler spy by some ways - Let test with another website.

This is my code, you can try it in you env:

const puppeteer = require('puppeteer');
async function crawler(url, evaluator) {
  const browser = await puppeteer.launch({
    headless: false,
  });
  const page = await browser.newPage()
  await page.goto(url)
  const result = await page.evaluate(evaluator)
  // await browser.close();
  return result

}

(async () => {
  let result = await crawler('https://google.com', () => {
    const nodes = Array.from(document.querySelectorAll('a'));
    return nodes.map(({ innerText }) => innerText)
  });
  console.log(result);
})();
hoangdv
  • 15,138
  • 4
  • 27
  • 48
  • I ran this code as is and received `ReferenceError: document is not defined`. It could be an issue with my puppeteer version. I am running puppeteer inside of docker on alpine linux and the latest that is supported there is version `v1.11.0`. I didn't think it was relevant but I guess it is so I will add that to my question. Thanks for your help. – Charlie Martin May 16 '19 at 03:13
  • @CharlieMartin I run my code with `node 11.12.0` and `puppeteer v1.16.0` – hoangdv May 16 '19 at 03:15
  • I'm `node v10.15.3` and `puppeteer v1.11.0`. I think I will just assume this is an issue with the old puppeteer version and work around it. Thanks – Charlie Martin May 16 '19 at 03:21