4

When I run the following code in the page console I'm trying to scrape, I got picture.

document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)')

However, when I run this in my program, the console log it and returns '{}'

const inputContent = await page.evaluate(() => {
return document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)'); });
Ryan Soderberg
  • 585
  • 4
  • 21
  • How are you loading the page? Are you loading with `waitUntil: 'networkidle0'`? Are you trying to console a HTML element on the nodejs console or just get the text/link? – Md. Abu Taher Mar 06 '19 at 06:48
  • I have added that code so now it fully loads, I also added .innerHTML after the selector. I am trying to grab that giant block of text from the image in the main post so I can pull content out of it – Ryan Soderberg Mar 06 '19 at 07:17
  • You are trying to pull text from image? :/ – Md. Abu Taher Mar 06 '19 at 07:23
  • tbh, it's hard to help if you don't provide more code or url, so that we can reproduce this problem. I dealt with lots of react/vue/angular site scraping, but still I needed more specific information. – Md. Abu Taher Mar 06 '19 at 07:25
  • No sorry, I was referring to the image I posted in my OP. I wish I could link you but it's in the admin panel and I can't share access. Here is another picture https://imgur.com/a/LaG8dU3 – Ryan Soderberg Mar 06 '19 at 07:32
  • 1
    Instead of sending us pictures, please copy and paste just the code you want into your question. – Heretic Monkey Mar 06 '19 at 20:29
  • Does this answer your question? [Puppeteer page.evaluate querySelectorAll return empty objects](https://stackoverflow.com/questions/46377955/puppeteer-page-evaluate-queryselectorall-return-empty-objects) – ggorlen Feb 17 '22 at 16:50

3 Answers3

11

puppeteer can transfer two types of data between Node.js and browser context: serializable data (i.e. data that is supported by JSON.stringify()/JSON.parse()) and JavaScript object ids (including DOM elements) — JSHandle and ElementHandle. Later ones have a bit more complicated API (see JSHandle and ElementHandle methods or methods that mention them).

page.evaluate() can only transfer serializable data, and instead of un-serializable data, it returns undefined or empty objects. DOM elements are non-serializable as they contain circular references and methods.

So if you just need some text or element attributes, try to do most of the processing in the browser context and return just serializable data.

vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • Any chance you know how one would collect an array of elements using puppeteer then and then use them later then? or is it simply not possible? For example. if I want to iterate over an array of elements and click each one, is my only option to do this from within the evaluate function? – switch201 May 17 '19 at 16:24
  • @switch201 You can use `page.evaluateHandle()` for this. For example: https://gist.github.com/vsemozhetbyt/67c0d4951c79ee216d567a21d926bad2 – vsemozhebuty May 17 '19 at 20:55
3

Make sure the page loads completely before scraping.

page.goto(url, {waitUntil: 'networkidle0'})

Also, according to the docs, .evaluate will return a promise, it will not return a DOM element.

It will print {} on console or the value the promise resolves to on console.

Md. Abu Taher
  • 17,395
  • 5
  • 49
  • 73
0

In your case you're trying to select a custom dom object injected into the page which is leading to some strange behavior when using the nth-child() css selector. So you should try to target the DOM node directly instead. So let's say you were trying to get a similar element here https://wefunder.com/chattanoogafc

You can do:

const inputContent = await page.evaluate(async () => {
  var elements =  document.querySelectorAll("#sb-site > div.sticky_footer > div")[3].querySelectorAll("*")[0];
  return elements.getAttribute("company-json");
});

console.log("test:" + inputContent);

And that should return the JSON that you want. You can then parse it using JSON.parse(inputContent)

codemon
  • 1,456
  • 1
  • 14
  • 26