1

I'm trying to load page elements into an array and retrieve the innerHTML from both and be able to click on them.

var grabElements = await page.$$(selector);
await grabElements[0].click();

This allows me to grab my elements and click on them but it won't display innerHTML.

var elNum = await page.$$eval(selector, (element) => {
    let n = []
    element.forEach(e => {
        n.push(e);
    })
    return n;  
});
await elNum[0].click();

This lets me get the innerHTML if I push the innerHTML to n. If I push just the element e and try to click or get its innerHTML outside of the var declaration, it doesn't work. The innerHTML comes as undefined and if I click, I get an error saying elnum[index].click() is not a function. What am I doing wrong?

BrainLag
  • 88
  • 10
  • 1
    The difference is that `$$eval` returns whatever serializable the callback returns while `$$` returns the element handles. In other words, you could use `eval` to get `.innerHTML` and `$$` to get clickable handles. But you can also pull HTML from handles and click inside `eval`s with native functions, so it's pretty flexible. Could you show a simple HTML example, with the elements you want to click and the text you're hoping to get? Thanks. – ggorlen Oct 20 '21 at 04:17
  • An example of what I'm trying to get is: `
    This is the innerHTML text I want.
    `. On the page, it's text inside a clickable portion of the website. What i want to do is loop through the available options, then click on the ones that match an innerHTML I'm looking for.
    – BrainLag Oct 20 '21 at 15:10
  • I've tried using grabElement[0].getProperty('innerHTML').jsonValue() but I keep getting back `.jsonValue() is not a function.`. – BrainLag Oct 20 '21 at 15:24
  • If you want to click something by text, why not use [xpath](https://stackoverflow.com/a/58088028/6243352)? Do you have a link to the page? The specification still seems rather vague and pseudocodey for me to be able to write a runnable, complete answer. – ggorlen Oct 20 '21 at 15:35
  • Right, it could just be a representation of the HTML with any JS necessary to show/hide the element if you can't share the page. Glad you worked it out, anyway. Consider a [self answer](https://stackoverflow.com/help/self-answer) if you think your solution might help the community. – ggorlen Oct 20 '21 at 21:52
  • Sorry, but another similar problem came up. I can't link the page because it's past a login and I'm not exactly comfortable sharing that. If I posted a screencap of the HTML I'm trying to scrape would that help? – BrainLag Oct 20 '21 at 22:12
  • It's the same type of problem but inside a table. Should I just make a new question? – BrainLag Oct 20 '21 at 22:28
  • I'd prefer a copy-paste of the text content of the page, otherwise I'd probably have to type it all in by hand to be able to write code to it. [Canonical explanation](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) on images of code. Yeah, sounds like a new question. – ggorlen Oct 20 '21 at 23:53
  • Thank you for trying to help. I would link it but it contains information I can't share. – BrainLag Oct 21 '21 at 00:06
  • No problem. I just mean type out whatever's in the screenshot you were already planning on sharing as text. You can change any sensitive info and anonymize as needed. – ggorlen Oct 21 '21 at 00:39
  • Please, could you let me know why you use grabElements[0].click(); and it works? I spent many days coding with for and forEach to click each element and it didn't work. – titoih Oct 28 '22 at 07:22
  • 1
    @titoih Maybe ask a new question if you have a new question. What about `grabElements[0].click()` is confusing and/or not working exactly? – ggorlen Dec 02 '22 at 19:35

1 Answers1

1

The difference between page.$$eval (and other evaluate-style methods, with the exception of evaluateHandle) and page.$$ is that the evaluate family only works with serializable values. As you discovered, you can't return elements from these methods because they're not serialiable (they have circular references and would be useless in Node anyway).

On the other hand, page.$$ returns Puppeteer ElementHandles that are references to DOM elements that can be manipulated from Puppeteer's API in Node rather than in the browser. This is useful for many reasons, one of which is that ElementHandle.click() issues a totally different set of operations than running the native DOMElement.click() in the browser.

From the comments:

An example of what I'm trying to get is: <div class = "class">This is the innerHTML text I want. </div>. On the page, it's text inside a clickable portion of the website. What i want to do is loop through the available options, then click on the ones that match an innerHTML I'm looking for.

Here's a simple example you should be able to extrapolate to your actual use case:

const puppeteer = require("puppeteer"); // ^19.1.0
const {setTimeout} = require("timers/promises");

const html = `
<div>
  <div class="class">This is the innerHTML text I want.</div>
  <div class="class">This is the innerHTML text I don't want.</div>
  <div class="class">This is the innerHTML text I want.</div>
</div>
<script>
document.querySelectorAll(".class").forEach(e => {
  e.addEventListener("click", () => e.textContent = "clicked");
});
</script>
`;

const target = "This is the innerHTML text I want.";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);

  ///////////////////////////////////////////
  // approach 1 -- trusted Puppeteer click //
  ///////////////////////////////////////////
  const handles = await page.$$(".class");

  for (const handle of handles) {
    if (target === (await handle.evaluate(el => el.textContent))) {
      await handle.click();
    }
  }

  // show that it worked and reset
  console.log(await page.$eval("div", el => el.innerHTML));
  await page.setContent(html);

  //////////////////////////////////////////////
  // approach 2 -- untrusted native DOM click //
  //////////////////////////////////////////////
  await page.$$eval(".class", (els, target) => {
    els.forEach(el => {
      if (target === el.textContent) {
        el.click();
      }
    });
  }, target);

  // show that it worked and reset
  console.log(await page.$eval("div", el => el.innerHTML));
  await page.setContent(html);

  /////////////////////////////////////////////////////////////////
  // approach 3 -- selecting with XPath and using trusted clicks //
  /////////////////////////////////////////////////////////////////
  const xp = '//*[@class="class"][text()="This is the innerHTML text I want."]';

  for (const handle of await page.$x(xp)) {
    await handle.click();
  }

  // show that it worked and reset
  console.log(await page.$eval("div", el => el.innerHTML));
  await page.setContent(html);

  ///////////////////////////////////////////////////////////////////
  // approach 4 -- selecting with XPath and using untrusted clicks //
  ///////////////////////////////////////////////////////////////////
  await page.evaluate(xp => {
    // https://stackoverflow.com/a/68216786/6243352
    const $x = xp => {
      const snapshot = document.evaluate(
        xp, document, null,
        XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null
      );
      return [...Array(snapshot.snapshotLength)]
        .map((_, i) => snapshot.snapshotItem(i))
      ;
    };
    $x(xp).forEach(e => e.click());
  }, xp);

  // show that it worked
  console.log(await page.$eval("div", el => el.innerHTML));
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Output in all cases is:

<div class="class">clicked</div>
<div class="class">This is the innerHTML text I don't want.</div>
<div class="class">clicked</div>

Note that === might be too strict without calling .trim() on the textContent first. You may want an .includes() substring test instead, although the risk there is that it's too permissive. Or a regex may be the right tool. In short, use whatever makes sense for your use case rather than (necessarily) my === test.

With respect to the XPath approach, this answer shows a few options for dealing with whitespace and substrings.

ggorlen
  • 44,755
  • 7
  • 76
  • 106