2

I've been trying to use Puppeteer to scrape a website, but when I try to obtain the screenshot it never loads it either goes to a TimeoutError or just never finishes.

(async () => {
        try{
        const navegador = await puppeteer.launch({headless: false},{defaultViewport: null});
        const pagina = await navegador.newPage();
        await pagina.setDefaultNavigationTimeout(3000);
        await pagina.goto(urlSitio, {waitUntil: 'load'});
        await pagina.setViewport({width: 1920, height: 1080});
        await pagina.waitForNavigation({waitUntil: 'load'});
        await pagina.screenshot({
            fullPage: true,
            path: `temporales/temporal.png`
        });
        await navegador.close();
        }catch(err){
            console.log(err);
        }
    })();

I've tried to set await pagina.setDefaultNavigationTimeout(3000); to 0 and multiple other numbers.

I've tried removing headless: false.

I've also tried putting all the different options for

await pagina.waitForNavigation({waitUntil: 'load'});

The website example I'm using is https://www.xtract.io/

Error message:

(node:9644) UnhandledPromiseRejectionWarning: TimeoutError: Navigation timeout of 3000 ms exceeded
    at C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\LifecycleWatcher.js:106:111
    at async FrameManager.navigateFrame (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:90:21)
    at async Frame.goto (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:416:16)
    at async Page.goto (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:789:16)
    at async C:\Users\Samuel\Desktop\somnus-monitor\back\index.js:103:9
(Use `node --trace-warnings ...` to show where the warning was created)
(node:9644) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:9644) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
ggorlen
  • 44,755
  • 7
  • 76
  • 106

3 Answers3

1

There appears to be an unnecessary waitForNavigation call here. Since you already waited until page load, waiting for another navigation that never occurs is going to cause a timeout. Re-add the commented-out line below to reproduce your problem.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({
    headless: false, 
    defaultViewport: null,
  });

  try {
    const [page] = await browser.pages();
    await page.setViewport({width: 1920, height: 1080});
    await page.goto("https://www.xtract.io/", {waitUntil: "load"});
    //await page.waitForNavigation({waitUntil: "load"}); // this will timeout
    await page.screenshot({
      fullPage: true,
      path: "temporal.png",
    });
  }
  catch (err) {
    console.error(err);
  }

  await browser.close();
})();

As an aside, I don't think you meant to pass multiple objects to puppeteer.launch. Just add all of the settings to a single object as the second argument as shown above.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • Thanks that solved it fot that url, but when I try to do it with a websites like https://www.udemy.com/ it doesn't work as intended. Is there any reason for that? – berry_malicious Feb 12 '21 at 21:55
  • Substituting `"https://www.udemy.com"` in place of `"https://www.xtract.io/"` worked for me using the same code as above. I'm using `"puppeteer": "^1.11.0-next.1547527073587"` – ggorlen Feb 12 '21 at 23:30
  • Does it take the screenshot when everything loads or is it still loading and some images are not showing? – berry_malicious Feb 12 '21 at 23:36
  • 1
    Hang on, I ran it again and it does appear that some images are black. I'll look into it, but the `waitForNavigation` isn't the way to go, probably it's a networkidle0 or lazy loading image issue. Scrolling the page down and back up after a delay might solve the issue? – ggorlen Feb 12 '21 at 23:37
  • 1
    This is a separate (known) issue. Check out [How to wait until all images completed loading? #338](https://github.com/puppeteer/puppeteer/issues/338), [Puppeteer wait for all images to load then take screenshot](https://stackoverflow.com/questions/46160929/puppeteer-wait-for-all-images-to-load-then-take-screenshot) and [Puppeteer: Screenshot lazy images not working](https://stackoverflow.com/questions/55506935/puppeteer-screenshot-lazy-images-not-working) for ideas on how you can go about ensuring images have loaded for a particular site before taking the shot. – ggorlen Feb 13 '21 at 00:40
0

I would wait for a selector and not waste time waiting for the all page to load. instead, use page .waitForSelector('#myId') Waiting for all the pages to load can take time instead you can wait only for what you need and then take a screenshot.

Ethanolle
  • 1,106
  • 1
  • 8
  • 26
-1

i have the same question, I refer to this website to solve.

slove this question

await page.goto('https://ourcodeworld.com', {
    waitUntil: 'load',
    // Remove the timeout
    timeout: 0
});
  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/30456730) – Beso Nov 29 '21 at 06:58