0

I've been looking for a solution to this, and found a few here that focus on clicking an element, but none that allow for clicking an element based on a link.

Using puppeteer, I'm looping over an array of tabs

<div role="tablist">
    <div><a href="#one" tabindex="-1" role="tab" aria-selected="false" class="">One</a></div>
    <div><a href="#two" tabindex="-1" role="tab" aria-selected="false" class="">Two</a></div>
    <div><a href="#three" tabindex="0" role="tab" aria-selected="true" class="icn-cv-down">three</a></div>
</div>

and able to grab the url or hash, but getting the error link.click() is not a function. I believe this is due to Puppeteer not being able to trigger a click the same way as JS, but unsure of the way forward:

let tabs = await page.evaluate(() => {
  var tab = [...document.querySelectorAll('[role="tablist"] a')].map(
    (el) => el.hash
  );
  return tab;
});
let components = [];
if (tabs) {
  tabs.forEach((link, index) => {
    setTimeout(() => {
      link.click();
      components.push(
        [...document.querySelectorAll(".ws-compid")]
          .map((component) => component.innerText)
          .filter((el) => el !== "")
      );
    }, 200 * index);
  });
}
console.log(components);

I believe I need an async function to be able to trigger the click event, but not sure. This should be able to click the href value of each tab, and then push values from the page into an array of components.

Matt
  • 1,561
  • 5
  • 26
  • 61
  • `link` is presumably just a string here (values of `hash`), and strings have no `.click()` method. You're then attempting to access `document` inside of Node. That won't work. If you share the page and show what your expected result is, I can help show what you can do to get that result rather than merely telling you that what not to do. – ggorlen Jan 04 '23 at 22:26
  • @ggorlen - Essentially, `tabs` is an array of links from tabbed navigation. I need to be able to loop over each one (like I'm doing now), and have Puppeteer click each one, then grab the values of `.ws-compid` (like I'm doing now) and push those values in the `components` array. This all works in Chrome dev tools, but need to get it working in my puppeteer script. – Matt Jan 04 '23 at 22:31
  • I understand that you want to click some tabs and extract some values, but without seeing the page you're asking me to hit a target in the dark. I can't execute anything to provide verifiable, working code. If it all works in dev tools, then the easiest approach is to plop your dev tools code inside an `evaluate()` callback without modification, and be sure to wait for any necessary selectors as the page loads. But just because code works in dev tools offers no guarantee it'll work in Puppeteer, even in an `evaluate`, for many reasons (bot detection, iframes, async loading, shadow DOM...). – ggorlen Jan 04 '23 at 22:33
  • Updated my question with sample markup – Matt Jan 04 '23 at 22:37
  • Using that markup, you should be able to make a simple page locally to test. Unfortunately, the page for this isn't currently live. – Matt Jan 04 '23 at 22:40
  • Actually, there's no way this code would work in the browser, either. `hash` is still going to be a string and you can't click strings. It'd be much better if you made the simple page so it actually reflects the thing you're working with. – ggorlen Jan 04 '23 at 22:47

1 Answers1

0

I can't run your page to see what the actual behavior is, but based on the limited information provided, here's my best attempt at piecing together a working example you can adapt to your use case:

const puppeteer = require("puppeteer"); // ^19.1.0

const html = `
<div role="tablist">
  <div><a href="#one" tabindex="-1" role="tab" aria-selected="false" class="">One</a></div>
  <div><a href="#two" tabindex="-1" role="tab" aria-selected="false" class="">Two</a></div>
  <div><a href="#three" tabindex="0" role="tab" aria-selected="true" class="icn-cv-down">three</a></div>
  <div class="ws-compid"></div>
</div>
<script>
document.querySelectorAll('[role="tablist"] a').forEach(e => 
  e.addEventListener("click", () => {
    document.querySelector(".ws-compid").textContent = e.textContent;
  })
);
</script>`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const components = await page.evaluate(() =>
    Promise.all(
      [...document.querySelectorAll('[role="tablist"] a')].map(
        (e, i) =>
          new Promise(resolve =>
            setTimeout(() => {
              e.click();
              resolve(
                [...document.querySelectorAll(".ws-compid")]
                  .map(component => component.innerText)
                  .filter(e => e)
              );
            }, 200 * i)
          )
      )
    )
  );
  console.log(components); // => [ [ 'One' ], [ 'Two' ], [ 'three' ] ]
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

A lot can go wrong translating browser code to Puppeteer: asynchronous loading, bot detection, iframes, shadow DOM, to name a few obstacles, so if this doesn't work, I'll need a reproducible example.

Although you claim your original code works, I don't see how that's possible. The pattern boils down to:

const tabs = [..."ABCDEF"];
let components = [];
tabs.forEach((link, index) => {
  setTimeout(() => {
    components.push(link);
  }, 200 * index);
});
console.log(components); // guaranteed to be empty

// added code
setTimeout(() => {
  console.log(components.join("")); // "ABCDEF"
}, 2000);

You can see that console.log(components) runs before the setTimeouts finish. Only after adding an artificial delay do we see components filled as expected. See the canonical thread How do I return the response from an asynchronous call?. One solution is to promisify the callbacks as I've done above.

Note also that sleeping for 200 milliseconds isn't ideal. You can surely speed this up with a waitForFunction.


In the comments, you shared a site that has similar tabs, but you don't need to click anything to access the text that's revealed after each click:

const puppeteer = require("puppeteer");

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const url = "https://www.w3.org/WAI/ARIA/apg/example-index/tabs/tabs-manual.html";
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const text = await page.$$eval(
    '#ex1 [role="tabpanel"]',
    els => els.map(e => e.textContent.trim())
  );
  console.log(text);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

So there's a good chance this is an xy problem.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • This is running without an error, except the components array is returning empty, and doesn't appear to be clicking the tab links. – Matt Jan 04 '23 at 22:51
  • This isn't the page that my code is running on, but for example: https://www.w3.org/WAI/ARIA/apg/example-index/tabs/tabs-manual.html. If you run let tabs = [...document.querySelectorAll('[role="tablist"] button')] let components = [] if(tabs){ tabs.forEach((link, index) => { setTimeout(() => { link.click(); components.push([...document.querySelectorAll('.ws-compid')].map(component => component.innerText).filter(el => el !== "")) }, 200 * index); }); } that will trigger the tabs there (though those are buttons and not anchor links) – Matt Jan 04 '23 at 23:53
  • The components that should be logged out are not the tabs, but component ids within the tab content of each clicked tab. Of course, the important thing in this problem is being able to click the tab, and not the content being scraped. – Matt Jan 04 '23 at 23:54
  • Doesn't my example do all of that? The code you pasted above still fails to properly await for `setTimeout` completions as I explained in my post. – ggorlen Jan 04 '23 at 23:57
  • BTW, in the example site you linked, there's no need to click the tabs in order to extract the text that appears in the tab. `await page.$$eval('#ex1 [role="tabpanel"]', els => els.map(e => e.textContent.trim()))` is all you need. See my update. This is another example of why seeing the actual site is important, if it's not clear to you yet. Often, there's a far easier way to get the data you want than the approach you assume is needed. – ggorlen Jan 05 '23 at 00:05
  • 1
    Ended up getting it working with your refactor, thanks! – Matt Jan 05 '23 at 00:14