0

After programming for a long time I am now trying my to get practical with Javascript. I want to load a page, and click a button and then check the available variables to test if they are looking good.

I was able to do this succesfully for just a normal page visit, however after simulating a button click that generates a popup I am not able to get the variable anymore.

Here are the steps that I want to take:

  1. Open the page, accept cookies
  2. Check the dataLayer
  3. Click a button to add a product in a basket (generates some kind of popup/something over the normal page)
  4. Check the dataLayer again

Step 2 works, but step 4 returns Undefined. I suspect it may be because it is focusing on the wrong thing, but I did not find a way to resolve this.

Here is the minimal code that reproduces the problem, I swapped out the link with example.com to be safe.

// Loading dependencies
const puppeteer = require('puppeteer');

// Opening the browser
(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: false
  })

// Simulating the user behavior
  const page = await browser.newPage()
  await page.goto('https://www.example.com/')
  await page.click('button#onetrust-accept-btn-handler')
  let dataLayer1 = await page.evaluate(() => {
    return window.dataLayer
  })
  await page.click('button.button.button--solid.button--custom.button--color-primary.add-to-cart')

  let dataLayer2 = await page.evaluate(() => {
    return window.dataLayer
  })
  
  console.log('First output')
  console.log(dataLayer1)
  console.log('Second output')
  console.log(dataLayer2)
  console.log('Finished')

})();

I am not sure if it is relevant but I am running on windows and updated everything to the latest versions.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • "Here is the minimal code that reproduces the problem, I swapped out the link with example.com to be safe." -- problem is, the link is part of what's needed to reproduce the problem. If you can't provide the link, can you download the HTML/CSS/JS of that page and strip it down to the relevant parts, then `setContent` to inject it? If I can't run it, I basically can't help, since there's no obvious errors in the code shown here. – ggorlen Sep 30 '22 at 18:19
  • Ah, I had hoped that clicking something that creates an overlay screen/frame was a known challenge, if not then I will try to reproduce it on a less sensitive website – Dennis Jaheruddin Sep 30 '22 at 20:13
  • How the page happens to manipulate its custom `dataLayer` variable, when/where the popup/modal/overlay shows up and so forth is unclear. – ggorlen Sep 30 '22 at 20:52
  • It is not that custom, I already found that if you inspect the page source of stackoverflow you also can search for this dataLayer. Now I still need to find a place where a sort of popup/overlay is generated to check if the problem in accessing it also occurs here. – Dennis Jaheruddin Oct 01 '22 at 02:16
  • I see. Apparently it's a Google Tag Manager thing. I'm not sure when it'd be updated based on an overlay, so you might want to edit/tag your post to be very clear about this so Google Tag Manager subject matter experts can take a look. I assume it's relevant since the relationship between the overlay appearing and `dataLayer` changing seems domain-specific to GTM, at least without a general runnable example. Why do you need to access this (what are you trying to achieve) anyway? – ggorlen Oct 01 '22 at 02:27
  • If `dataLayer` is an object that you're expecting to change, you can try using `waitForFunction` to poll until it changes after the click. It seems weird that `dataLayer` would suddenly be undefined though. Seems like the sort of thing that's permanently attached to the window as the client uses the website. Focus shouldn't matter when you're just puling out a variable from the window, unless the website is explicitly setting the variable to be undefined after the click or based on focus/blur, which seems unusual to me. – ggorlen Oct 01 '22 at 02:42
  • I don't see the site doing anything to clear the variable, and when inspecting the dataLayer via the developer tools in chrome (also in the one controlled by puppeteer) it actually shows it properly still after opening the popup. – Dennis Jaheruddin Oct 01 '22 at 20:34
  • It does. Thanks a bunch. The problem seems to be that `dataLayer`'s struture becomes circular after the click because there's a React fiber and/or DOM node in there, and can't be serialized. Puppeteer returns undef when there's a circular reference, as you can see here: `await page.evaluate(() => {const o = {}; o.a = o; return o;});`. I'm back to my earlier question: Why do you need to access this (what are you trying to achieve) anyway? This is important to know so I can provide a workaround or solution that meets your underlying goals. Maybe you only want some of the data, for example. – ggorlen Oct 01 '22 at 21:12
  • See [xy problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/233676#233676) to help motivate why providing context is important. Another idea is doing whatever processing you might be doing inside of the browser where the structure is fully intact. – ggorlen Oct 01 '22 at 21:15
  • What I want is to check the dataLayer (this is the point from which some people can pull data) , at this stage it would suffice to test if it has an element in there called 'becomingSkynet', later the content of this could prove interesting as well. – Dennis Jaheruddin Oct 01 '22 at 21:32
  • Where in the object am I supposed to find this "becomingSkynet"? Is it a key, a value...? Are you trying to write a test? There's a lot of back and forth here, and if you don't mind taking the time to [edit] the post and provide a complete specification for what you are trying to accomplish, I can probably help get you a solution much quicker than this. – ggorlen Oct 01 '22 at 21:40
  • Anyway, if your original question was "why undefined" then this should be resolved. – ggorlen Oct 01 '22 at 21:47
  • I don't see any "becomingSkynet" substring in the 2.5 megabyte JSON structure parsed with MDN's [circular structure pruner](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Cyclic_object_value#examples). – ggorlen Oct 02 '22 at 15:10

1 Answers1

1

The problem here is pretty simple. All data returned from evaluate callbacks needs to be serializable. It's a classic Puppeteer gotcha to assume DOM elements are serializable, but they aren't due to circular references and dependencies on native browser objects that won't work in Node, like document and window.

The tricky part is that Puppeteer fails silently when you try to return an object with a circular reference, defaulting to undefined. A minimal example is

const o = await page.evaluate(() => {
  const o = {};
  o.o = o; // circular reference
  return o;
});
console.log(o); // => undefined

It turns out that dataLayer is a Google Tag Manager (GTM)-related data structure. Before the click, there is no DOM node in the nested structure, so it serializes just fine, but after the click, a "gtm.element" key appears, pointing to an element associated with an event. This causes the second attempt at evaluate to fail to serialize for the circular reference reason described above.

A solution is to simply omit this "gtm.element" property during the serialization process:

const fs = require("fs").promises;
const puppeteer = require("puppeteer"); // ^18.0.4

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const url = "https://www.visionexpress.com/sunglasses/gucci-gg-0631s-001/8056376305852";
  await page.goto(url, {waitUntil: "domcontentloaded"});
  await (await page.waitForSelector("#onetrust-accept-btn-handler")).click();
  await page.click(".add-to-cart");
  const dataLayer = await page.evaluate(`
    JSON.stringify(
      dataLayer,
      (k, v) => k === "gtm.element" ? undefined : v,
      2
    )
  `);
  await fs.writeFile("dataLayer.json", dataLayer);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

It was mentioned in a comment by OP that they're looking for a key or value called (or with the substring) "becomingSkynet", but this property is nowhere in the 2.5 MB structure that results when you use a circular reference serializer such as the one from MDN which lets you serialize most of the properties in the DOM node tree (most of this data is meaningless). So it sounds like they'll need to check their assumptions and perhaps take other actions on the page to get that property to appear.

ggorlen
  • 44,755
  • 7
  • 76
  • 106