4

I'm running puppeteer in an alpine-chrome container with sandboxing, everything is done exactly as the docs suggest. And I've created a whole bunch of different puppeteer-fueled servers.

I'm launching Puppeteer without any arguments, except headless: false.

Why is await page.goto() method so problematic? It keeps giving me all kinds of non-sensical problems.

I'm always doing this:

await page.goto('https://my-url.com/something', {
  waitUntil: 'networkidle2',
  timeout: 0
})

Right now it just randomly hangs. Nothing happens, code execution just stops there. All I see is that CPU utilization hits the roof and then down to slightly-elevated levels, comparing to the properly-ongoing execution. 90% of time the goto doesn't elevate CPU utilization at all. Same page, same server, hangs randomly within the same script execution.

Why is this hapening? Do I have to put setTimeout and try-catch statements all over my code around each page.goto() method? It feels like I'm using puppeteer fundamentally incorrect, because this behaviour doesn't make sense.

Sorry for vague question, I don't understand what kind of information would be helpful here.

pavelsaman
  • 7,399
  • 1
  • 14
  • 32
stkvtflw
  • 12,092
  • 26
  • 78
  • 155
  • 1
    I dont know why that is happening either but i always do ```await Promise.All([ page.waitForNavigation(), page.goto(url) ])``` instead of passing `waitUntil` property to the `page.goto()` method as an argument. That works nice for me. – Sezerc Jan 13 '21 at 19:08
  • `page.goto` should work fine. Is this really all of your code? Which URL are you using? Do you have an [unnecessary `waitForNavigation` call later on](https://stackoverflow.com/questions/66177812/puppeteer-never-completely-loads-the-page/66179257#66179257)? – ggorlen Feb 20 '21 at 01:28

2 Answers2

0

It looks like the goto method is throwing you errors. Therefore it's important to review this tidbit of info from puppeteer documentation -

https://devdocs.io/puppeteer/index#pagegotourl-options

page.goto will throw an error if:

-there's an SSL error (e.g. in case of self-signed certificates).

-target URL is invalid. the timeout is exceeded during navigation.

-the remote server does not respond or is unreachable.

-the main resource failed to load.

I would also recommend trying networkidle0 and maybe not passing a timeout parameter. It's odd you're having this issue as normally its not page.goto() function giving me issues but working with the selectors.

Kyle
  • 77
  • 5
0

It hangs because you have disabled timeout by setting timeout: 0 option and it is waiting until there are no more than 2 network connections for at least 500 ms. You can set timeout option in milliseconds. The following will throw timeout error if page.goto() is not completed within 30 seconds.

await page.goto('https://my-url.com/something', {
  waitUntil: 'networkidle2',
  timeout: 30000,
})

Please refer to puppeteer documentation at https://devdocs.io/puppeteer/index#pagegotourl-options .

Matt Ke
  • 3,599
  • 12
  • 30
  • 49