It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.
In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.
With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the "text/"
prefix, XPaths and CSS selectors with DOM traversal.
I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:
const puppeteer = require("puppeteer"); // ^19.7.5
const html = `
<A>
<B>1</B>
</A>
<A>
<B>2</B>
</A>
`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const el = await page.evaluateHandle(() =>
[...document.querySelectorAll("A")].find(el =>
[...el.querySelectorAll("B")].find(
e => e.textContent.trim() === "1"
)
)
);
console.log(await el.evaluate(el => el.outerHTML)); // just to verify
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
If you don't mind XPath syntax, you could instead use the shorter:
const el = await page.$('xpath///A[B[normalize-space() = "1"]]');
If <B>
can be deeply nested within <A>
:
const el = await page.$('xpath///A[//B[normalize-space() = "1"]]');
If you don't need an ElementHandle, you can just use $$eval
or evaluate
directly and return the text or other serializable data you need.
See this answer for more options for XPath text extraction in Puppeteer.