How to select element with nested elements which fulfill a condition in Puppeteer?

Question

I have a list of elements of tag A in my HTML, that each have nested elements of tag B. How do I select the element A, whose nested element B fulfills a specific condition using puppeteer?

HTML example:

<A>
    <B>1</B>
</A>
<A>
    <B>2</B>
</A>

When I try to get the A element containing the B with innerText "1", I've tried

const element = await page.evaluate(() => {
    return [...document.querySelectorAll("A > B[innerText='1']")];
});
console.log(element); // undefined

Answers without jQuery are preferred.

ggorlen · Accepted Answer · 2023-03-20T16:56:32.230

It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.

In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.

With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the "text/" prefix, XPaths and CSS selectors with DOM traversal.

I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:

const puppeteer = require("puppeteer"); // ^19.7.5

const html = `
<A>
    <B>1</B>
</A>
<A>
    <B>2</B>
</A>
`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const el = await page.evaluateHandle(() =>
    [...document.querySelectorAll("A")].find(el =>
      [...el.querySelectorAll("B")].find(
        e => e.textContent.trim() === "1"
      )
    )
  );
  console.log(await el.evaluate(el => el.outerHTML)); // just to verify
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

If you don't mind XPath syntax, you could instead use the shorter:

const el = await page.$('xpath///A[B[normalize-space() = "1"]]');

If <B> can be deeply nested within <A>:

const el = await page.$('xpath///A[//B[normalize-space() = "1"]]');

If you don't need an ElementHandle, you can just use $$eval or evaluate directly and return the text or other serializable data you need.

See this answer for more options for XPath text extraction in Puppeteer.

How to select element with nested elements which fulfill a condition in Puppeteer?

1 Answers1