6

Why puppeteer page.goto() hangs?

This is another example of this happening ^

I've never understood why, but even the simplest of puppeteer scripts fails to get passed goTo()

I have the following code:

    const browser = await puppeteer.launch({ executablePath: '/usr/bin/google-chrome-unstable', args: ["--proxy-server='direct://'", '--proxy-bypass-list=*', '--no-sandbox', '--disable-setuid-sandbox'] });
    const page = await browser.newPage();
    await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');
    await page.setDefaultNavigationTimeout(0);

    log('before nav');
    await page.goto('http://www.google.com');

    log('waiting nav');
    await page.waitForNavigation({
        waitUntil: 'networkidle0'
    });

    log('complete');

    await browser.close();

output is:

$:# node lib/tests/nav.test.js
    before nav
    waiting nav

It never ever logs "complete". I have tried without the proxy-bypass and proxy-server, no-sandbox on and off. I have tried networkidle2. It doesn't complete. I also tried various different websites. I am on Puppeteer 4.x and using node 12.x and npm 6.x.

I have investigated the Promise method that people suggest:

await Promise.All([ page.goto('http://www.google.com'), page.waitForNavigation() ]);

and this works sometimes. but it's very sporadic.

Is puppeteer really this buggy? Is there ANY known way to guarantee a page load before performing actions. I need to nav around and fill in forms and click buttons etc so the elements all need to be there.

I also tried the waitForSelector() version too, that doesn't load either.

I feel like puppeteer is fundamentally broken. Any ideas?

simonw16
  • 960
  • 9
  • 25
  • 2
    Remove `waitForNavigation`. You already successfully navigated to the page, now your script is blocking waiting for a second navigation that never happens. If you're not convinced, try printing `page.content()` and see what it looks like right after `page.goto` where you're logging `'waiting nav'`. – ggorlen Feb 20 '21 at 01:25
  • `waitForNavigation` is should be called before the click, reload, submit or any events that cause the navigation itself. NEVER do the opposite or it will hangs, like you've said. – Edi Imanto Feb 21 '21 at 08:20

1 Answers1

8

You're not using Puppeteer the way you should, that's why it feels buggy, but it's mostly because you don't know what's going on in the script.

Let's have a look.

await page.goto('http://www.google.com');

this really means this:

await page.goto('http://www.google.com', { waitUntil: 'load' });

So you're going to a page and waiting till the load event fires. That means that when you reach behind this line, there's no other navigation to wait for. Therefore this line:

await page.waitForNavigation();

will wait forever. That's what you describe as "it hangs".

So since the method page.goTo() already offers ways to specify when the navigation succeeded (currently they are: load, domcontentloaded, networkidle0, networkidle2), you don't really need to use any other method in combination with page.goTo().

Another situation is when you click a button and expect something to happen, like a navigation or a selector to appear in page. Since the method page.click() doesn't offer the same options as page.goTo(), you often need to use another method in combination, like so:

await Promise.all([
    page.waitForNavigation(),
    page.click(selector)
]);

or to wait for navigation and some selector:

await Promise.all([
    page.waitForNavigation({ waitUntil: 'networkidle0' }),
    page.waitForSelector(selector),
    fbButton.click()
]);

If you stick to these patterns, I don't think you'll feel Puppeteer is buggy. It's just it's a low level tool, so you need to write more code than in some other framework like WebdriverIO and the like.

pavelsaman
  • 7,399
  • 1
  • 14
  • 32
  • Hi! Ok, so Promise.all() is what I need. You are right, Race conditions are likely kicking in. I thought I was mitigating them with `Promise.all([ page.goto(), page.waitForNavigation() ]);` as I saw this marked as a solution on StackOverflow a long while ago. But now I understand that `goto()` has it's own waitFor, then the above Promise.all() would also never resolve. Am I right about this? – simonw16 Feb 23 '21 at 22:37
  • Related, but somewhat different I find that some webpages are very very slow to load in puppeteer (chrome-unstable). I set up events to log every request and response, and on one website, the request logs, then theres 35s of nothing, then the responses kick in. Is this something to do with chrome, or the website? The website responds instantly if I run it in my desktop browser. – simonw16 Feb 24 '21 at 01:36
  • Regarding your first question, I think using Promise.all() with page.goto() and page.waitForNavigation() would work fine, because you are not awaiting page.goto() first, so page.waitForNavigation() can acrually resolve. – pavelsaman Feb 24 '21 at 07:34
  • Regarding your second question, that's difficult to sau without a concrete example, I don't know what might be a problem there. – pavelsaman Feb 24 '21 at 07:35