1

I have a line of code that looks

await page.$$eval("a", as => as.find(a => a.innerText.includes("shop")).click());

So, it will click at shop and all okay, but if shop is written like this - "S&#65279h&#65279op". So, puppeteer wouldn't be able to find it. Is it possible to ignore &#65279? So, that puppeteer would only see "shop".

takendarkk
  • 3,347
  • 8
  • 25
  • 37

1 Answers1

0

You can decode the innerText using DOMParser. Example copied from this answer.

window.getDecodedHTML = function getDecodedHTML(encodedStr) {
  const parser = new DOMParser();
  const dom = parser.parseFromString(
    `<!doctype html><body>${encodedStr}`,
    "text/html"
  );
  return dom.body.textContent;
}

Save the above snippet to some file like script.js and inject it for easier usage.

await page.evaluate(fs.readFileSync('script.js', 'utf8'));

Now you can use it to decode the innerText.

await page.$$eval("a", as => as.find(a => getDecodedHTML(a.innerText).includes("shop")).click());

The solution might not be optimal. But it should work out.

Here is another snippet for you which doesn't require DOMparser.

window.getDecodedHTML = function(str) {
  return str.replace(/&#(\d+);/g, function(match, dec) {
    return String.fromCharCode(dec);
  });
};
Md. Abu Taher
  • 17,395
  • 5
  • 49
  • 73