1

Just started using Puppeteer. Trying to parse a page but the evaluate method won't work somehow.

var Browser
var Page
var Result
puppeteer.launch()
  .then(function (browser) {
    console.log('Browser Created\nCreating Blank Page')
    Browser = browser
    return Browser.newPage()
  })
  .then(function (page) {
    console.log('Page Created\nVisiting URL')
    Page = page
    return Page.goto(URL)
  })
  .then(function (resp) {
    console.log('Website Loaded')
    return Page.evaluate(function () {
      // Completely Sync Stuff
      console.log('Evaluating Selectors')
      var myElems = document.getElementsByClassName('challenge-type light')
      Result = myElems
    })
  })
  .then(function (val) {
    console.log(Result)
    console.log('Done! Exiting')
    Browser.close()
    process.exit()
  })
  .catch(function (err) {
    Browser.close()
    console.log(err)
    process.exit(1)
  })

Output :

Browser Created
Creating Blank Page
Page Created
Visiting URL
Website Loaded
undefined
Done! Exiting

What could possibly be the error? Would prefer a solution without async/await.

EDIT: "Evaluating Selectors" is not logged to the console as well, so the code never reaches there, is my concern.

sakshamsaxena
  • 107
  • 11
  • are you sure that document.getElementsByClassName('challenge-type light') actually returns a result? – ByteMe Jan 03 '19 at 20:49
  • 1
    Don't use those global `Browser`, `Page`, `Result` variables. There are [many much better ways to access previous promise results in a `.then()` chain](https://stackoverflow.com/q/28250680/1048572)! – Bergi Jan 03 '19 at 21:19
  • Last time I checked, `page.evaluate` does not support closures. Try not to assign to `Result`, instead `return` a value. – Bergi Jan 03 '19 at 21:20
  • 1
    ... return a *serializable* value. If the function passed to the `page.evaluate()` returns a non-Serializable value, then `page.evaluate()` resolves to undefined. [Ref](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pageevaluatepagefunction-args) – Roamer-1888 Jan 04 '19 at 01:49

3 Answers3

0

I would double check that

document.getElementsByClassName('challenge-type light') 

returns a result.

I believe you're using a headless browser, so sometimes elements may not load as you might expect.

ByteMe
  • 1,159
  • 2
  • 15
  • 28
0

Got the things working finally.

  1. Console inside evaluate will be in page context, so that's the console of chromium page.
  2. We need to return something from the evaluate function. DOM elements won't be returned as is because they lose context outside evaluate.

This worked :

.then(function (resp) {
    console.log('Website Loaded')
    return Page.evaluate(function () {
      return document.querySelector('.cover-heading').innerText
    })
  })
sakshamsaxena
  • 107
  • 11
0

OK you are on the right path but you have a few problems.

From your own answer: you noted that the console logs executed in the page context when they are executed in the evaluate method. You are correct in saying that but you are incorrect in saying that you can't return DOM elements from the evaluate method. You can just your code isn't quite correct.

So what you have is this:

.then(function (resp) {
  console.log('Website Loaded')
  return Page.evaluate(function () {
    // Completely Sync Stuff
    console.log('Evaluating Selectors')
    var myElems = document.getElementsByClassName('challenge-type light')
    Result = myElems
  })
})
.then(function (val) {
  console.log(Result)
  console.log('Done! Exiting')
});

This won't work since you're trying to assign myElems to the Result variable inside the evaluate method. The evaluate method is executed in the browser. It has no idea that a Result variable exists in your puppeteer script. This is why your variable outputs as undefined at the end.

How to resolve this is as follows:

.then(function () {
  return Page.evaluate(function () {
    // Return the array of elements from inside the evaluate method
    return document.getElementsByClassName('challenge-type light')
  });
})
.then(function (elements) {
  console.log(elements) // Will be your array of elements
});

Hopefully this helps!

AJC24
  • 3,280
  • 2
  • 19
  • 30
  • I did this and got a blank object for a known selector. I tried exactly what you wrote before writing my answer, when I wasn't returning the inner text and relying on next promise to do that, but all it could ever get was a blank object. The selector query returned HTMLDivElement type which had no meaning elsewhere outside the evaluate, hence I concluded this. Feel free to correct me if I'm wrong though. – sakshamsaxena Jan 04 '19 at 20:38
  • OK if you're getting a blank object then I'd suggest the class names you're using in the `getElementsByClassName` don't correspond to any elements in the UI. The other confusing thing is that your original question is attempting to return the list of all elements with the class names you specified but the answer you posted uses a class name **not** in your original post and it's also returning the `innerText` which, again, is something new you've introduced. – AJC24 Jan 05 '19 at 13:05