0

Here is my code. Basically What I am willing to do is take the HTML and parse it to get the content.

async function main() {
  const browser = await puppeteer.launch({
    headless: false,
    executablePath: EXECUTABLE_PATH,
    devtools: true,
    timeout: 50000,
  });
  const page = await browser.newPage();
  await page.goto(URL);
  //   await page.screenshot({ path: "screenshot.png", fullPage: true });
  const rows = await page.evaluate(() => {
    return [...document.querySelectorAll(".td-block-span6")];
  });
  console.log(rows);
}
main();

1

console.log is giving me this

[
  {},
  {},
  {},
  {},
  {},
  {},
  { closure_uid_230013206: 25 },
  { closure_uid_230013206: 22 },
  { closure_uid_230013206: 20 },
  { closure_uid_230013206: 15 }
]

2

Priyesh Ranjan
  • 111
  • 1
  • 3
  • 2
    We need a little more info. What is the URL? And what content are you trying to parse? – Benny May 15 '21 at 07:32

2 Answers2

2

Unfortunately, page.evaluate() can only transfer serializable values (roughly, the values JSON can handle). As document.querySelectorAll() returns a collection of DOM elements that are not serializable (they contain methods and circular references), each element in the collection is replaced with an empty object. You need to return either serializable value (for example, an array of texts or href attributes) or use something like page.$$(selector) and ElementHandle API.

  const rows = await page.evaluate(() => {
    return [...document.querySelectorAll(".td-block-span6")].map(elem => elem.innerText);
  });
  console.log(rows);

Or:

const rows = await page.$$(".td-block-span6");
for (const row of rows) {
 // process row as ElementHandle
}
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
0

If you want all the text on the page, this should work:

let text = await document.querySelector('body').innerText;
Benny
  • 156
  • 9
  • 3
    `document.querySelector('body').innerText;` isn't async so there's no point in putting `await` in front of it. OP seems to want `.td-block-span6`, not the whole body text. – ggorlen May 15 '21 at 15:37