0

I'm using Puppeteer and am trying to use document.querySelectorAll to get a list of elements to then loop over and do something, however, it seems that something is wrong in my code, it either returns nothing, undefined or an empty {} despite my elements being on the page, my JS:

let elements = await page.evaluate(() => document.querySelectorAll("div[class^='my-class--']"))
for (let el of Array.from(elements)) {
  // do something
}

what's wrong with my elements and page.evaluate here?

Ryan H
  • 2,620
  • 4
  • 37
  • 109
  • 1
    Does this answer your question? [Puppeteer page.evaluate querySelectorAll return empty objects](https://stackoverflow.com/questions/46377955/puppeteer-page-evaluate-queryselectorall-return-empty-objects) – Natnael A. Mar 28 '20 at 17:02

2 Answers2

0

As far as I understand, puppeteer returns all the HTML as a giant string. This is because Node doesn't run in the browser so the HTML doesn't get parsed. So DOM selectors won't work.

What you can do to solve this issue is to use the Cheerio.js module, which allows you to grab elements with JQuery as if it is a parsed DOM.

Ludolfyn
  • 1,806
  • 14
  • 20
  • This works for me, however, is there anyway that I can click a button on the webpage after the content has been loaded into cheerio and re-update the content that cheerio has to parse? – Ryan H Mar 28 '20 at 18:23
  • Awesome @RyanHolton ! You can totally interact with the page, but you need to do it within the `page.evaluate()` function. This is **before** you load the content into a variable with `page.content()`—before you load the content for Cheerio. So, inside `page.evaluate()` function you can interact with the DOM like usual (with `document.querySelector()`) and find the button you want to click on and run the `.click()` method e.g. `document.querySelector('#theButton').click()`. It will click on the button and load the new page, but give the page some time to load before grabbing the content. – Ludolfyn Mar 28 '20 at 21:37
  • @RyanHolton You might need to write a function that will run through the process twice if you want to grab the content from both pages. – Ludolfyn Mar 28 '20 at 21:39
  • "As far as I understand, puppeteer returns all the HTML as a giant string" is incorrect, unless you call `page.content()`, which is not the normal way to use Puppeteer. Puppeteer runs the browser in real time and lets you run console code to manipulate the live page. Cheerio is a poor solution, bringing in another package that isn't necessary and requiring you to snapshot the whole page as a giant string just to run a single selector. See [this blog post](https://serpapi.com/blog/puppeteer-antipatterns/#using-a-separate-html-parser-with-puppeteer) and the dupe link for further details. – ggorlen Aug 27 '23 at 19:14
-1

Since puppeteer returns all HTML as a string you could use DOMParser like in the below example.

let doc = new DOMParser().parseFromString('<template class="myClass"><span  class="target">check it out</span></template>', 'text/html');
let templateContent = doc.querySelector("template");
let template = new DOMParser().parseFromString(templateContent.innerHTML, 'text/html');
let target = template.querySelector("span");
console.log([templateContent,target]);
  • This is misleading and unnecessary. Please see [this comment](https://stackoverflow.com/questions/60903764/puppeteer-queryselectorall-doesnt-get-elements-properly#comment135720655_60903904) – ggorlen Aug 27 '23 at 19:15