1

Solved but unable to mark own answer as solved yet.


We are trying to return the child elements of the 'body' selector in Puppeteer.

The following code works and returns the inner text of the divs:

  const page = await browser.newPage();
  await page.goto(url);
  await page.waitForSelector('body');

  const children = await page.$eval('body', el => el.innerText)
  console.log(children)

But, when we change the await page.$eval to this, it returns undefined.

const children = await page.$eval('body', el => el.children) 

Is there something we are missing?

To add context, our ultimate goal is to use Puppeteer to scrape a React application and render a fiber tree.

Once the tree is built, we are hoping to render it using D3. The goal is for the React application itself to be rendered, then scraped, then have the fiber tree visualized, similar to the Chrome devtools. We took inspiration for using Puppeteer from ReactION. Reinventing the wheel for learning purposes.

The intention with await page.$eval('body', el => el.children) was to obtain an array of the child elements so we could search for the _reactRootContainer property.

We are attempting a variation of the below currently but are getting Object reference chain is too long.

const bodyHandle = await page.$('#root');
const result = await page.evaluateHandle((e) => e.children, bodyHandle);
console.log(result.jsonValue());
  • @ggorlen, We are trying to access the _reactRootContainer property within these children elements, which unfortunately did not work with the link you posted as I don't think it's serializable. We are exploring the suggestion from vsemozhebuty but have not had success yet. – Phil Sentance May 17 '21 at 17:48
  • 1
    That wasn't mentioned in your question, but vsemozhebuty and the link explains the phenomenon that is asked about in your question. If something is serializable, it can be returned, if not, you get undefined. If your actual use case is more complicated than getting `.innerText`, please describe it to avoid an [xy problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/233676#233676). – ggorlen May 17 '21 at 17:54
  • I have edited, thank you. – Phil Sentance May 17 '21 at 18:00
  • Thanks, that use case is fundamentally different from returning an `innerHTML` which is easy because it's serializable. React fibers aren't serializable, highly complex objects that have state, so what you're trying to do is probably not possible as far as I know. What are you hoping to achieve once you do build the React fiber tree? As described in the XY problem link above, this seems like a bizarre thing to want to do. If you're doing some sort of testing, maybe describe the ultimate goal that the fiber tree reconstruction is intended to achieve. There may be an easier way to go about this. – ggorlen May 17 '21 at 18:06
  • Once the tree is built, we are hoping to render it using D3. The goal is for the React application itself to be rendered, then scraped, then have the fiber tree visualized, similar to the Chrome devtools. We took inspiration for using Puppeteer from [ReactION](https://github.com/ReactION-js/ReactION). Reinventing the wheel for learning purposes. – Phil Sentance May 17 '21 at 18:17
  • I'd edit the post to explain all of that. This would save a lot of time and guesswork about what you are trying to do. I guess the answer then is, how did they do it? Copy that. You [can see in their code](https://github.com/ReactION-js/ReactION/blob/master/src/puppeteer.ts#L71) they're walking the tree entirely in the browser context and returning only the serialized data they need to make the visualization. This avoids copying the fibers or rebuilding the fiber tree in the Node context entirely, which respects and works with the limitations of Puppeteer rather than trying to circumvent them. – ggorlen May 17 '21 at 18:24
  • Unfortunately, trying exactly what they did is what landed us here. Their implementation is very similar to our first attempt. But I will add that to the post, thank you. – Phil Sentance May 17 '21 at 18:26
  • I wouldn't necessarily throw the baby out with the bath water -- if it works for them and you tried the same thing and it isn't working for you, then there's probably just a bug in your first attempt. You might consider asking about that. – ggorlen May 17 '21 at 18:41

2 Answers2

1

Unfortunately, page.$eval() and the similar ones can only transfer serializable values (roughly, the values JSON can handle). As el.children returns a collection of DOM elements that are not serializable (they contain methods and circular references), it is replaced with undefined. You need to return either serializable value (for example, an array of texts or href attributes) or use something like page.evaluateHandle() and ElementHandle API.

vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
1

If you are trying to console.log in a headless browser, make sure you are looking at the console.logs in the browser and not your terminal... Solved the problem. Thanks for your patience @ggorlen.

  • If you want to see `console.log` output from the page in your Node/Puppeteer session, you might try this out: [How do print the console output of the page in puppeter as it would appear in the browser?](https://stackoverflow.com/a/60075804/6243352) – ggorlen May 17 '21 at 20:27