1

I'm currently trying to get some informations from a website (https://www.bauhaus.info/) and fail at the cookie popup form.

This is my code till now:

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.bauhaus.info');
    await sleep(5000);
    const html = await page.content();
    fs.writeFileSync("./page.html", html, "UTF-8");
    page.pdf({
        path: './bauhaus.pdf', 
        format: 'a4'
    });
});

function sleep(ms) {
    return new Promise((resolve) => {
        setTimeout(resolve, ms);
    });
}

Till this everything works fine. But I can't accept the cookie banner, because I don't see the html from this banner in puppeteer. But in the pdf I can see the form.

enter image description here My browser

enter image description here Puppeteer

Why can I not see this popup in the html code? Bonus quest: Is there any way to replace the sleep method with any page.await without knowing which js method triggers the cookie form to appear?

Raphael
  • 53
  • 7
  • Sleep: await page.waitForTimeout(4000) – Konrad May 08 '22 at 21:16
  • Why no popup in HTML? This popup is loaded through js and you are saving initial HTML – Konrad May 08 '22 at 21:16
  • How do you try to close the banner? – Konrad May 08 '22 at 21:17
  • It's in the shadow DOM. See something like [Puppeteer not giving accurate HTML code for page with shadow roots](https://stackoverflow.com/questions/68525115/puppeteer-not-giving-accurate-html-code-for-page-with-shadow-roots/68540701#68540701) which has an explanation and a ton of resources. Also, try to avoid sleeping if you can possibly help it -- it's slow and unreliable. – ggorlen May 08 '22 at 21:39
  • Also, please only ask one question per post. That said, I don't know what you mean by the "bonus quest". – ggorlen May 08 '22 at 21:56

1 Answers1

0

This element is in a shadow root. Please visit my answer in Puppeteer not giving accurate HTML code for page with shadow roots for additional information about the shadow DOM.

This code dips into the shadow root, waits for the button to appear, then clicks it. Optionally, it waits for the element to be removed, then snaps a screenshot.

const puppeteer = require("puppeteer"); // ^19.11.1

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const url = "https://www.bauhaus.info/";
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const el = await page.waitForSelector("#usercentrics-root");
  const sel = '[data-testid="uc-accept-all-button"]';
  await page.waitForFunction((el, sel) =>
    el.shadowRoot.querySelector(sel),
    {},
    el,
    sel,
  );
  await el.evaluate((el, sel) =>
    el.shadowRoot.querySelector(sel).click(),
    sel
  );

  // to prove it worked, wait for the popup
  // to disappear, then take a screenshot
  const root = await page.waitForSelector("#usercentrics-root");
  await page.waitForFunction((root, sel) =>
    !root.shadowRoot.querySelector(sel), {}, root, sel
  );
  await page.screenshot({path: "clicked.png"});
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Since the time of the original post, Puppeteer has an easier way to traverse the shadow DOM, >>>:

// ...
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const sel = '[data-testid="uc-accept-all-button"]';
  const btn = await page.waitForSelector(">>> " + sel);
  await btn.click();
// ...

Thinking outside the box, if you don't really need to click the button and just need the modal out of the way as quickly and easily as possible, you can blast away the whole outer container, shadow root and all:

// ...
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const el = await page.waitForSelector("#usercentrics-root");
  await el.evaluate(el => el.remove());
// ...

This is an underrated technique: if part of the page is getting in the way and is unrelated to your goal, just rip it out and forget about it! This is in a similar spirit to blocking unneeded resources. You don't have to use the site as intended.

Another step further outside the box: depending on what you're really on the site to accomplish, you can often do it using native untrusted DOM methods such as .click() inside evaluate blocks, which don't care about visibility. This means you can potentially ignore the modal entirely.

See also: Can't locate and click on a terms of conditions button which uses the same #usercentrics-root.

ggorlen
  • 44,755
  • 7
  • 76
  • 106